Code
library(dplyr)
library(purrr)
library(stringr)
library(readr)
library(httr2)
library(rvest)
library(lubridate)
library(glue)
Matthew Harris
March 19, 2022
Loading the custom palette that I created from my Creating a Cyberpunk 2077 color palette post.
It looks like changes were made to how the URLs are broken out for the Nintendo Switch games library. Each URL is saved as a string that will be read fed into wiki_scrape()
. There’s a good chance that the URLs or naming convention will change again in the future.
games_url_0a <- "https://en.wikipedia.org/wiki/List_of_Nintendo_Switch_games_(0-9_and_A)"
games_url_b <- "https://en.wikipedia.org/wiki/List_of_Nintendo_Switch_games_(B)"
games_url_cg <- "https://en.wikipedia.org/wiki/List_of_Nintendo_Switch_games_(C-G)"
games_url_hp <- "https://en.wikipedia.org/wiki/List_of_Nintendo_Switch_games_(H-P)"
games_url_qz <- "https://en.wikipedia.org/wiki/List_of_Nintendo_Switch_games_(Q-Z)"
wiki_scrape <- function(wiki_url) {
# Be polite
# Sleep between requests for 1-3 seconds
Sys.sleep(sample(1:3, 1))
wiki_url %>%
read_html() %>%
html_nodes(css = "#softwarelist") %>%
html_table(fill = TRUE) %>%
as.data.frame() %>%
as_tibble()
}
game_data <- list(games_url_0a, games_url_b, games_url_cg,
games_url_hp, games_url_qz) %>%
map_df(.f = ~wiki_scrape(.x))
# A tibble: 6 × 6
Title Genre.s. Devel…¹ Publi…² Relea…³ Ref.
<chr> <chr> <chr> <chr> <chr> <chr>
1 0 Degrees Action, platformer, puzzle EastAs… EastAs… May 19… [1][…
2 #1 Anagrams Board game, edutainment, … Eclips… Eclips… May 14… [5][…
3 #1 Crosswords Board game, edutainment, … Eclips… Eclips… Februa… [8][…
4 1-2-Switch Party Ninten… Ninten… March … <NA>
5 10 Second Ninja X Action platformer, puzzle Four C… Thalam… July 3… <NA>
6 10 Second Run Returns Party, racing Blue P… Blue P… Decemb… [11]…
# … with abbreviated variable names ¹Developer.s., ²Publisher.s., ³Release.date
Some additional wrangling is necessary. I’m also choosing to “explode” the data frame by genre. This will duplicate the game titles for each genre that it has listed. This “exploded” format allows me to count the frequency of genres mentioned. Another approach would be to determine which single genre best describes a game.
switch_library <- game_data %>%
select(-Ref.) %>%
janitor::clean_names() %>%
setNames(str_remove_all(names(.), "_s")) %>%
mutate(release_date = as.Date(release_date,format = "%B %d, %Y"),
release_ym = floor_date(release_date, "month"),
release_year = year(release_date),
release_day = yday(release_date),
common_date = as.Date(release_day, origin = glue("{year(Sys.Date()) - 1}-12-31")))
# Separate all genre string into their on rows
switch_library <- switch_library %>%
drop_na() %>%
filter(release_date <= Sys.Date()) %>%
mutate(genre = tolower(genre),
genre = str_remove_all(genre, "-")) %>%
separate_rows(genre) %>%
mutate(genre_title_case = stringr::str_to_title(genre))
switch_library %>%
head()
# A tibble: 6 × 10
title genre devel…¹ publi…² release_…³ release_ym relea…⁴ relea…⁵ common_d…⁶
<chr> <chr> <chr> <chr> <date> <date> <dbl> <dbl> <date>
1 0 Degr… acti… EastAs… EastAs… 2021-05-19 2021-05-01 2021 139 2022-05-19
2 0 Degr… plat… EastAs… EastAs… 2021-05-19 2021-05-01 2021 139 2022-05-19
3 0 Degr… puzz… EastAs… EastAs… 2021-05-19 2021-05-01 2021 139 2022-05-19
4 #1 Ana… board Eclips… Eclips… 2021-05-14 2021-05-01 2021 134 2022-05-14
5 #1 Ana… game Eclips… Eclips… 2021-05-14 2021-05-01 2021 134 2022-05-14
6 #1 Ana… edut… Eclips… Eclips… 2021-05-14 2021-05-01 2021 134 2022-05-14
# … with 1 more variable: genre_title_case <chr>, and abbreviated variable
# names ¹developer, ²publisher, ³release_date, ⁴release_year, ⁵release_day,
# ⁶common_date
# ℹ Use `colnames()` to see all variable names
Next I want to identify the top genres by count.
# A tibble: 4 × 1
genre_title_case
<chr>
1 Action
2 Puzzle
3 Adventure
4 Roleplaying
switch_library %>%
filter(genre_rank <= 4) %>%
group_by(release_ym, genre_title_case) %>%
count() %>%
ggplot(aes(release_ym, n, col = genre_title_case)) +
geom_line(size = 1) +
geom_point(size = 2.5, col = "white") +
geom_point(size = 2) +
facet_grid(rows = vars(genre_title_case)) +
scale_color_cp_2077_d() +
scale_x_date(breaks = breaks_width("1 year"), date_labels = "%b %Y") +
labs(x = "Release Date", y = "Release Count", col = "Genre") +
theme_minimal(base_size = 14) +
theme(legend.position = "none", axis.title.x = element_blank())
switch_library %>%
mutate(common_floor_month = floor_date(common_date, "month")) %>%
group_by(release_year, common_floor_month) %>%
count() %>%
ggplot(aes(common_floor_month, n, col = factor(release_year))) +
geom_line(size = 1.5) +
geom_point(size = 4, col = "white") +
geom_point(size = 3) +
scale_color_cp_2077_d() +
scale_x_date(breaks = breaks_width("1 month"), date_labels = "%b") +
labs(x = "Release Month", y = "Release Count", col = "Release Year") +
theme_minimal(base_size = 14) +
theme(legend.position = "bottom", axis.title.x = element_blank()) +
guides(col = guide_legend(nrow = 1))
I also decided to create a Tableau dashboard using the same data. Feel free to check that out.