Data was collected from the website, Basketball-Reference.com, by Sports Reference. The data set consists of all the games played by LeBron James in the most recent season. This includes the 2020-2021 regular season and the six games played by LeBron James in the 2021 playoffs, downloaded as two Excel files, respectively. Using Excel, the files were converted to CSV files. These were then read into Python. The subsets of all games that LeBron James played in were concatenated to exclude the games in which he was inactive. The index of the new file was renumbered from 0 to 50, for the 51 games that LeBron James played in for 2020-2021 regular season and playoffs.
Of the 30 columns in the data file, the 22 metrics recorded for each game were the quantitative features under consideration to formulate a predictive model. A scatterplot matrix reveals that Points (PTS) exhibit the strongest correlation with Field Goals (FG) and Game Score (GmSc), respectively.
Four parsimonious forecasting models were fit to the quantitative data. I selected the model with the highest coefficient of determination, explaining about 93 percent of the variation in the data. This model includes two variables, field goals and game score, to predict the points scored in a game by LeBron James. A one-step-ahead forecast was employed, using the values for field goals and game score from the last playoff game of 2021. This resulted in a prediction that LeBron James would score 28 points in the second game of the next preseason, on October 6, 2021, between the L.A. Lakers and the Phoenix Suns, in Phoenix, Arizona.
Statistical computing for data analysis was performed using the Python programming language.
MS in Statistics