Animating Plots in R

Author

Matthew Harris

Published

January 20, 2020

Abstract
Using {plotly} to animate plots in R.

Goals

Animated visualization can be great for communicating changes over time that might be harder to interpret from static plots. My goal is to calculate the per year podium counts for each of the top ten constructors. I then want to create a plot in Plotly that will allow me animate the change in podiums over time.

Packages

First we need to the packages need for this analysis.

Code
library(tidyverse)
library(lubridate)
library(scales)
library(plotly)
library(ggsci)
library(here)

Data Import

The data used for this analysis can be found at the following link.

Formula One Data

Data Wrangling

Before I can create the animated plot I need to transform the data into a format that will be easier to work with.

Code
my_pal <- pal_npg()(7)

f1_data <- f1_data %>% 
  filter(!is.na(dob)) %>%
  mutate(drv_name = iconv(paste(forename, surname), 
                          from = "UTF-8", 
                          to = "ASCII", sub = ""),
         race_age = time_length(interval(dob, race_date), "years"),
         ra_group = as.character(cut(race_age, 
                                     breaks = seq(15, 60, 5),
                                     labels = c("16-20", "21-25",
                                                "26-30", "31-35",
                                                "36-40", "41-45",
                                                "46-50", "51-55", 
                                                "56-60")))) %>%
  group_by(driverId) %>%
  mutate(podium = if_else(positionOrder %in% c(1:3), 1, 0),
         win = if_else(positionOrder == 1, 1, 0)) %>%
  ungroup()

With the top ten constructors identified, I can compute the additional values needed for the animated plot: the cumulative podium wins per year.

Code
top_10_const <- f1_data %>%
  #Groups by constructor name
  group_by(c_name) %>%
  #Calculates the total podiums by constructor
  summarize(total_p = sum(podium)) %>%
  #Filters for the top 10 constructors 
  #by total podiums
  slice_max(n = 10, order_by = total_p) %>%
  #Pulls the constructor name column out as a vector
  pull(c_name)


const_top_10 <- f1_data %>%
  #Filters by the top constructors identified 
  #from the previous step
  filter(c_name %in% top_10_const) %>%
  #Groups by the race year and constructor name
  group_by(race_year, c_name) %>%
  #Calculates the total podiums per year for 
  #each constructor
  summarize(annual_podium = sum(podium), 
            .groups = "drop") %>%
  ungroup() %>%
  #Sort the data by race year to ensure that the
  #cumulative calculations work correctly
  arrange(race_year) %>%
  #Pivots the data into a wide format so that each 
  #constructor with their annual podium count is in a 
  #separate column
  pivot_wider(names_from = c_name, 
              values_from = annual_podium) %>%
  #Replaces NA values with 0. This will ensure that the
  #cumulative podium calculation doesn't skip years where
  #there isn't data
  map_df(~if_else(is.na(.x), 0, .x)) %>%
  #Pivots the data back to the long format
  pivot_longer(cols = c(2:ncol(.)), 
               names_to = "c_name", 
               values_to = "annual_podium") %>%
  #Nest the data by constructor name
  group_by(c_name) %>%
  nest() %>%
  #Calculates the cumulative podium count for each year
  #for each consrtuctor
  mutate(cum_podiums = map(data, 
                            ~cumsum(.x$annual_podium))) %>%
  unnest(c(data, cum_podiums)) %>%
  #Drops the annual podium column
  ungroup() %>%
  select(-annual_podium)

const_top_10 %>%
  head()
# A tibble: 6 × 3
  c_name  race_year cum_podiums
  <chr>       <dbl>       <dbl>
1 Ferrari      1950           3
2 Ferrari      1951          16
3 Ferrari      1952          33
4 Ferrari      1953          49
5 Ferrari      1954          63
6 Ferrari      1955          72

Static plot

For comparison I have utilized our new values to visualize the podium performance with a static plot. In order to communicate the same information I’ve chosen to facet the plots by year. Since there are so many years to facet the plot becomes noisy and difficult to interpret. Now lets see how the animated plot performs.

Code
const_top_10 %>%
  ggplot(aes(c_name, cum_podiums, fill = c_name)) + 
  geom_col() + facet_wrap(~race_year) + 
  theme(axis.text.x = element_text(angle = 45)) +
  scale_fill_npg() + labs(x = "Top 10 Constructors",
                          y = "Total Podiums",
                          fill = "")

Creating the animated plot!!!

The animated plot is fairly easy to create. I just need to supply the race_year column to the frame parameter. This will allow anyone to use the slider to move through time and see how podiums counts have changed for each constructor. It’s also way easier to compare the performance of each constructor and identify trends.

Code
plot_ly(const_top_10,
        x = ~c_name,
        y = ~cum_podiums,
        color = ~c_name,
        colors = my_pal,
        type = "bar",
        frame = ~race_year) %>% 
  layout(yaxis = list(title = "Total Podiums"),
         xaxis = list(title = "Top 10 Constructors")) %>%
  animation_slider(currentvalue = list(prefix = "Year "))
BenettonBrabhamFerrariMcLarenMercedesRed BullRenaultTeam LotusTyrrellWilliams0100200300400500600700
BenettonBrabhamFerrariMcLarenMercedesRed BullRenaultTeam LotusTyrrellWilliamsYear 195019501955196019651970197519801985199019952000200520102015Top 10 ConstructorsTotal PodiumsPlay