For this example, we will be working with data from the 2024 NFL Big Data Bowl, which recently opened!
You can find the submissions from last year’s finalists here
Downloading the Data: 1. To get started with the data, you must first create a Kaggle account and join the competition here. The download button will be all the way at the bottom of the page on the right. Another option if you want to do this all from the command line would be through Kaggle’s API, in which you can download with the following command: kaggle competitions download -c nfl-big-data-bowl-2024 2. This will download the data as a zip file. To unzip, start by navigating in your file explorer to wherever you downloaded the zip file, then double-click the file.
- If you are working on Mac, this will unzip/extract the file, or use the unzip function in the Terminal - If you are working in Windows, you will have to click “Extract All” towards the top of the file explorer.
3. This will create a new folder in the same directory, within which all of the files will be unzipped.
You are now ready to get started working with the data!
## gameId playId nflId displayName frameId time
## 1 2022090800 56 35472 Rodger Saffold 1 2022-09-08 20:24:05.200000
## 2 2022090800 56 35472 Rodger Saffold 2 2022-09-08 20:24:05.299999
## 3 2022090800 56 35472 Rodger Saffold 3 2022-09-08 20:24:05.400000
## 4 2022090800 56 35472 Rodger Saffold 4 2022-09-08 20:24:05.500000
## 5 2022090800 56 35472 Rodger Saffold 5 2022-09-08 20:24:05.599999
## 6 2022090800 56 35472 Rodger Saffold 6 2022-09-08 20:24:05.700000
## jerseyNumber club playDirection x y s a dis o dir
## 1 76 BUF left 88.37 27.27 1.62 1.15 0.16 231.74 147.90
## 2 76 BUF left 88.47 27.13 1.67 0.61 0.17 230.98 148.53
## 3 76 BUF left 88.56 27.01 1.57 0.49 0.15 230.98 147.05
## 4 76 BUF left 88.64 26.90 1.44 0.89 0.14 232.38 145.42
## 5 76 BUF left 88.72 26.80 1.29 1.24 0.13 233.36 141.95
## 6 76 BUF left 88.80 26.70 1.15 1.42 0.12 234.48 139.41
## event
## 1 <NA>
## 2 pass_arrived
## 3 <NA>
## 4 <NA>
## 5 <NA>
## 6 pass_outcome_caught
play <- week_1 %>%
filter(gameId == 2022091112) %>%
filter(playId == 1613)
play[103:112,]
## gameId playId nflId displayName frameId time
## 103 2022091112 1613 42381 Preston Smith 1 2022-09-11 17:41:58.099999
## 104 2022091112 1613 42381 Preston Smith 2 2022-09-11 17:41:58.200000
## 105 2022091112 1613 42381 Preston Smith 3 2022-09-11 17:41:58.299999
## 106 2022091112 1613 42381 Preston Smith 4 2022-09-11 17:41:58.400000
## 107 2022091112 1613 42381 Preston Smith 5 2022-09-11 17:41:58.500000
## 108 2022091112 1613 42381 Preston Smith 6 2022-09-11 17:41:58.599999
## 109 2022091112 1613 42381 Preston Smith 7 2022-09-11 17:41:58.700000
## 110 2022091112 1613 42381 Preston Smith 8 2022-09-11 17:41:58.799999
## 111 2022091112 1613 42381 Preston Smith 9 2022-09-11 17:41:58.900000
## 112 2022091112 1613 42381 Preston Smith 10 2022-09-11 17:41:59.000000
## jerseyNumber club playDirection x y s a dis o dir
## 103 91 GB left 53.14 23.87 0.44 0.86 0.04 322.81 166.27
## 104 91 GB left 53.16 23.83 0.46 0.52 0.04 325.72 158.29
## 105 91 GB left 53.18 23.79 0.39 0.46 0.04 340.37 150.22
## 106 91 GB left 53.20 23.76 0.38 0.41 0.04 344.14 151.01
## 107 91 GB left 53.22 23.73 0.33 0.46 0.04 347.02 150.49
## 108 91 GB left 53.23 23.70 0.27 0.51 0.03 349.75 150.68
## 109 91 GB left 53.24 23.68 0.17 0.69 0.02 350.62 152.89
## 110 91 GB left 53.25 23.67 0.04 0.95 0.01 355.95 119.11
## 111 91 GB left 53.25 23.67 0.03 0.77 0.00 358.92 24.92
## 112 91 GB left 53.25 23.67 0.05 0.59 0.00 358.92 310.76
## event
## 103 <NA>
## 104 <NA>
## 105 <NA>
## 106 pass_arrived
## 107 <NA>
## 108 pass_outcome_caught
## 109 <NA>
## 110 <NA>
## 111 <NA>
## 112 <NA>
To start with plotting, we will first make a stationary plot for a single frame of a play. From there, we can make it into an animation!
So, we will get the background field as well as a scatter plot of the players.
I will be using sportyR to plot the field here, more documentation can be found on their GitHub
# install.packages("sportyR") # run this line if you don't have sportyR installed
library(sportyR)
geom_football("nfl")
geom_football("nfl") +
geom_point(data=play, aes(x, y))
Our data has (0, 0) being the bottom left of the left endzone, whereas the field defaults to (0, 0) at the very middle of the field. We can fix this with the x_trans and y_trans arguments when we create our field plot
For animating the play, we will be using the gganimation package. With this package, we can add the transition time function to our pre-existing graph object, and we just need to specify which variable each frame is dependent on.
How does it look? If you stuck with the play I left you, check that it makes sense with the real footage