Code
library(dplyr)
library(readr)
library(tidyr)
library(purrr)
library(writexl)
library(lubridate)
Matthew Harris
June 28, 2020
Being able to export and share your work is a key step in the data analysis process. In this post I’ll walk through the process of performing some group level summary statistics and exporting those grouped data frames to separate tabs within an Excel document and separate Excel workbooks.
The key package for this demonstration will be writexl
. It allows you to read/create Excel files without having to install other Java or Perl dependencies.
The data used for this analysis can be found at the following link.
I’ll be using a data set that contains performance statistics on Formula 1 drivers.
f1_summary <- f1_data %>%
# Removes special characters from driver names
mutate(across(.cols = c("forename", "surname"),
.fns = ~iconv(x = .x, from = "UTF-8",
to = "ASCII", ""))) %>%
filter(race_year >= 2000) %>%
mutate(podium = if_else(fl_position <= 3,
TRUE, FALSE)) %>%
group_by(driverId, race_year, forename, surname) %>%
summarize(total_races = n(),
fasest_lap = min(fastestLapTime, na.rm = TRUE),
total_podiums = sum(podium, na.rm = TRUE),
total_points = sum(points, na.rm = TRUE), .groups = "drop") %>%
ungroup()
head(f1_summary)
# A tibble: 6 × 8
driverId race_year forename surname total_races fasest_lap total_p…¹ total…²
<dbl> <dbl> <chr> <chr> <int> <drtn> <int> <dbl>
1 1 2007 Lewis Hamilton 17 72.506 secs 14 109
2 1 2008 Lewis Hamilton 18 74.159 secs 12 98
3 1 2009 Lewis Hamilton 17 74.345 secs 7 49
4 1 2010 Lewis Hamilton 19 73.851 secs 9 240
5 1 2011 Lewis Hamilton 19 77.209 secs 8 227
6 1 2012 Lewis Hamilton 20 77.020 secs 10 190
# … with abbreviated variable names ¹total_podiums, ²total_points
Now that the table is in the desired format I can split it based upon the race_year
variable using the group_split
function. This allows me to separate the data and keep the grouping variable in the split tables. In order for the sheet naming to work correctly I also need to name the list of data frames that I have created. This is done by using the set_names
function. I’m choosing to use the race years to create the names for the list. The final step is to pipe the named list of data frames into the write_xlsx
function where it it used to create the Excel workbook with it’s named tabs.
A similar process is used to export the data frames into separate Excel files. This time I will be nesting the data and mapping over the data frames to create each file. The nest
function allows me to use the grouping variable, in this case race_year
, to name each Excel file.