Code
library(tidyverse)
library(lubridate)
library(scales)
library(plotly)
library(ggsci)
library(here)
Matthew Harris
January 20, 2020
Animated visualization can be great for communicating changes over time that might be harder to interpret from static plots. My goal is to calculate the per year podium counts for each of the top ten constructors. I then want to create a plot in Plotly that will allow me animate the change in podiums over time.
First we need to the packages need for this analysis.
The data used for this analysis can be found at the following link.
Before I can create the animated plot I need to transform the data into a format that will be easier to work with.
my_pal <- pal_npg()(7)
f1_data <- f1_data %>%
filter(!is.na(dob)) %>%
mutate(drv_name = iconv(paste(forename, surname),
from = "UTF-8",
to = "ASCII", sub = ""),
race_age = time_length(interval(dob, race_date), "years"),
ra_group = as.character(cut(race_age,
breaks = seq(15, 60, 5),
labels = c("16-20", "21-25",
"26-30", "31-35",
"36-40", "41-45",
"46-50", "51-55",
"56-60")))) %>%
group_by(driverId) %>%
mutate(podium = if_else(positionOrder %in% c(1:3), 1, 0),
win = if_else(positionOrder == 1, 1, 0)) %>%
ungroup()
With the top ten constructors identified, I can compute the additional values needed for the animated plot: the cumulative podium wins per year.
top_10_const <- f1_data %>%
#Groups by constructor name
group_by(c_name) %>%
#Calculates the total podiums by constructor
summarize(total_p = sum(podium)) %>%
#Filters for the top 10 constructors
#by total podiums
slice_max(n = 10, order_by = total_p) %>%
#Pulls the constructor name column out as a vector
pull(c_name)
const_top_10 <- f1_data %>%
#Filters by the top constructors identified
#from the previous step
filter(c_name %in% top_10_const) %>%
#Groups by the race year and constructor name
group_by(race_year, c_name) %>%
#Calculates the total podiums per year for
#each constructor
summarize(annual_podium = sum(podium),
.groups = "drop") %>%
ungroup() %>%
#Sort the data by race year to ensure that the
#cumulative calculations work correctly
arrange(race_year) %>%
#Pivots the data into a wide format so that each
#constructor with their annual podium count is in a
#separate column
pivot_wider(names_from = c_name,
values_from = annual_podium) %>%
#Replaces NA values with 0. This will ensure that the
#cumulative podium calculation doesn't skip years where
#there isn't data
map_df(~if_else(is.na(.x), 0, .x)) %>%
#Pivots the data back to the long format
pivot_longer(cols = c(2:ncol(.)),
names_to = "c_name",
values_to = "annual_podium") %>%
#Nest the data by constructor name
group_by(c_name) %>%
nest() %>%
#Calculates the cumulative podium count for each year
#for each consrtuctor
mutate(cum_podiums = map(data,
~cumsum(.x$annual_podium))) %>%
unnest(c(data, cum_podiums)) %>%
#Drops the annual podium column
ungroup() %>%
select(-annual_podium)
const_top_10 %>%
head()
# A tibble: 6 × 3
c_name race_year cum_podiums
<chr> <dbl> <dbl>
1 Ferrari 1950 3
2 Ferrari 1951 16
3 Ferrari 1952 33
4 Ferrari 1953 49
5 Ferrari 1954 63
6 Ferrari 1955 72
For comparison I have utilized our new values to visualize the podium performance with a static plot. In order to communicate the same information I’ve chosen to facet the plots by year. Since there are so many years to facet the plot becomes noisy and difficult to interpret. Now lets see how the animated plot performs.
The animated plot is fairly easy to create. I just need to supply the race_year
column to the frame parameter. This will allow anyone to use the slider to move through time and see how podiums counts have changed for each constructor. It’s also way easier to compare the performance of each constructor and identify trends.