Chapter 5 04-Basketball

5.1 Intro

This chapter explores NBA player performance using team total data from the 2024–2025 season. I create composite metrics for offense (PRA: points + rebounds + assists) and defense (STOCKS: steals + blocks), merge team-level conference information, and compare distributions across East vs. West. I use visualizations, point-biserial correlations, a correlation matrix, and partial correlation to examine relationships between age and performance metrics.

5.2

load_team_data <- function(sheet_name, file_path = "NBA Team Total Data 2024-2025.xlsx") {
  df <- read_excel(file_path, sheet = sheet_name)
  
  df <- df %>%
    mutate(
      Team = sheet_name,                     
      Won_award = ifelse(is.na(Awards), 0, 1),  
      PRA = PTS + TRB + AST,     
      STOCKS = STL + BLK               
    )
  
  return(df)
}


file_path <- "NBA Team Total Data 2024-2025.xlsx"


team_sheets <- excel_sheets(file_path)


all_teams_list <- lapply(team_sheets, load_team_data, file_path = file_path)


nba_data <- bind_rows(all_teams_list)


head(nba_data)
## # A tibble: 6 × 35
##      Rk Player   Age     G    GS    MP    FG   FGA `FG%`  `3P` `3PA` `3P%`  `2P`
##   <dbl> <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1     1 Jalen…    24    79    22  2031   246   620 0.397   122   362 0.337   124
## 2     2 Keon …    22    79    56  1925   303   779 0.389   126   401 0.314   177
## 3     3 Nic C…    25    70    62  1882   320   568 0.563     5    21 0.238   315
## 4     4 Camer…    28    57    57  1800   355   747 0.475   159   408 0.39    196
## 5     5 Ziair…    23    63    45  1541   214   520 0.412   103   302 0.341   111
## 6     6 Tyres…    25    60    11  1315   189   465 0.406    99   282 0.351    90
## # ℹ 22 more variables: `2PA` <dbl>, `2P%` <dbl>, `eFG%` <dbl>, FT <dbl>,
## #   FTA <dbl>, `FT%` <dbl>, ORB <dbl>, DRB <dbl>, TRB <dbl>, AST <dbl>,
## #   STL <dbl>, BLK <dbl>, TOV <dbl>, PF <dbl>, PTS <dbl>, `Trp-Dbl` <dbl>,
## #   Awards <chr>, Team <chr>, Won_award <dbl>, PRA <dbl>, STOCKS <dbl>,
## #   Pos <chr>

5.3

conference_lookup <- read_excel("Team Conferences.xlsx")


nba_data <- nba_data %>%
  left_join(conference_lookup, by = "Team") %>%
  mutate(Conference_binary = ifelse(Conference == "East", 1, 0))

head(nba_data)
## # A tibble: 6 × 37
##      Rk Player   Age     G    GS    MP    FG   FGA `FG%`  `3P` `3PA` `3P%`  `2P`
##   <dbl> <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1     1 Jalen…    24    79    22  2031   246   620 0.397   122   362 0.337   124
## 2     2 Keon …    22    79    56  1925   303   779 0.389   126   401 0.314   177
## 3     3 Nic C…    25    70    62  1882   320   568 0.563     5    21 0.238   315
## 4     4 Camer…    28    57    57  1800   355   747 0.475   159   408 0.39    196
## 5     5 Ziair…    23    63    45  1541   214   520 0.412   103   302 0.341   111
## 6     6 Tyres…    25    60    11  1315   189   465 0.406    99   282 0.351    90
## # ℹ 24 more variables: `2PA` <dbl>, `2P%` <dbl>, `eFG%` <dbl>, FT <dbl>,
## #   FTA <dbl>, `FT%` <dbl>, ORB <dbl>, DRB <dbl>, TRB <dbl>, AST <dbl>,
## #   STL <dbl>, BLK <dbl>, TOV <dbl>, PF <dbl>, PTS <dbl>, `Trp-Dbl` <dbl>,
## #   Awards <chr>, Team <chr>, Won_award <dbl>, PRA <dbl>, STOCKS <dbl>,
## #   Pos <chr>, Conference <chr>, Conference_binary <dbl>
ggplot(nba_data, aes(x = PRA, y = STOCKS, color = factor(Conference_binary))) +
  geom_point(size = 3, alpha = 0.7) +
  labs(color = "Conference (1=East, 0=West)",
       x = "PRA (Points + Rebounds + Assists)",
       y = "STOCKS (Steals + Blocks)",
       title = "Offensive vs Defensive Performance by Conference") +
  theme_minimal()
Scatterplot of offensive output (PRA) versus defensive output (STOCKS), colored by conference (East vs West). This visual compares overall player performance patterns by conference.

Figure 5.1: Scatterplot of offensive output (PRA) versus defensive output (STOCKS), colored by conference (East vs West). This visual compares overall player performance patterns by conference.

ggplot(nba_data, aes(x = PRA, fill = factor(Conference_binary))) +
  geom_histogram(position = "dodge", bins = 15, alpha = 0.7) +
  labs(fill = "Conference (1=East, 0=West)",
       x = "PRA",
       y = "Number of Players",
       title = "Distribution of PRA by Conference") +
  theme_minimal()
Distribution of PRA by Conference

Figure 5.2: Distribution of PRA by Conference

cor_pra <- cor.test(nba_data$Conference_binary, nba_data$PRA)
cor_stocks <- cor.test(nba_data$Conference_binary, nba_data$STOCKS)

cor_pra
## 
##  Pearson's product-moment correlation
## 
## data:  nba_data$Conference_binary and nba_data$PRA
## t = -1.8195, df = 650, p-value = 0.0693
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.147164250  0.005629906
## sample estimates:
##         cor 
## -0.07118475
cor_stocks
## 
##  Pearson's product-moment correlation
## 
## data:  nba_data$Conference_binary and nba_data$STOCKS
## t = -2.094, df = 650, p-value = 0.03665
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.157650363 -0.005105577
## sample estimates:
##         cor 
## -0.08185737
cor_matrix <- nba_data %>%
  dplyr::select(Age, PRA, STOCKS) %>%
  cor(use = "pairwise.complete.obs")

ggcorrplot(cor_matrix, lab = TRUE, title = "Correlation Matrix: Age, PRA, STOCKS")
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## ℹ The deprecated feature was likely used in the ggcorrplot package.
##   Please report the issue at <https://github.com/kassambara/ggcorrplot/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Correlation matrix for Age, PRA, and STOCKS. Values summarize the direction and strength of associations among these variables.

Figure 5.3: Correlation matrix for Age, PRA, and STOCKS. Values summarize the direction and strength of associations among these variables.

partial_res <- pcor.test(nba_data$PRA, nba_data$STOCKS, nba_data$Age)
partial_res
##    estimate       p.value statistic   n gp  Method
## 1 0.8395996 3.657553e-174  39.37587 652  1 pearson

5.4

Point-biserial correlations were used to test whether conference membership relates to PRA and STOCKS. A partial correlation tested the association between PRA and STOCKS while controlling for age.