The struggle to find a PS5

Author

Matthew Harris

Published

February 11, 2021

Abstract
The struggle is real.
Note

This analysis was originally done in February of 2021 but I unfortunately didn’t save the data that I scrapped when it was run.

Like many people I have failed to successfully order a PS5 from any of the major electronic retailers in America. I’ve only attempted a couple of times but it appears as though the consoles sell out within minutes of being available. Since I don’t have a PS5 to distract me in my free time I decided to make a quick post on the launch so far. The data that I am using is pulled from the nowinstock.net website. This site is used to track item inventory for various products. The site contains a table of all previous PS5 stock information going all the way back to when the console was open for pre-order. # Packages

Code
library(dplyr)
library(rvest)
library(lubridate)
library(purrr)
library(stringr)
library(data.table)
library(tidyr)
library(ggplot2)
library(scales)
library(ggsci)

Scrapping the PS5 stock data

The first step is to scrape the table from the site. I prefer to use APIs to access data from websites but there isn’t one available for this site. Luckily the data can be scrapped in a single call and doesn’t require me to do any pagination.

Code
ps5_data <- xml2::read_html("https://www.nowinstock.net/videogaming/consoles/sonyps5/full_history.php")
Code
ps5_stock_data <- ps5_data %>% 
  html_node("table") %>% 
  html_table() %>% 
  as.data.frame() %>% 
  as_tibble() %>% 
  set_names(c("date_time", "status"))
Code
ps5_stock_data %>% 
  head()
# A tibble: 6 × 3
   ...1 date_time                     status                                    
  <dbl> <chr>                         <chr>                                     
1     1 Fri Aug 12 2022 - 5:10 PM EST "Amazon - Console Preorder for\n$?"       
2     2 Fri Aug 12 2022 - 4:26 PM EST "Amazon - Console Out of Stock"           
3     3 Fri Aug 12 2022 - 1:55 PM EST "Sam's Club - Bundle: Disc Horizon Forbid…
4     4 Thu Aug 11 2022 - 6:25 AM EST "Amazon - Console Preorder for\n$?"       
5     5 Thu Aug 11 2022 - 6:14 AM EST "Amazon - Console Out of Stock"           
6     6 Tue Aug 09 2022 - 9:25 AM EST "Amazon - Bundle: Digital Horizon Forbidd…

Great. I now have a data frame that contains all of the stock information. Next I need to clean the data to make it a little easier to use. The date_time column should be pretty easy to reformat but the status column will need a little more attention. The goal is to extract the name of the retailer, console details, retail price, and the time that the console was in/out of stock. All of this can be achieved using regular expression and functions from the {stringr} and {lubridate} packages.

Data Wrangling

Formatting

Code
ps5_stock_data_clean <- ps5_stock_data %>% 
  mutate(sale_date = parse_date_time(date_time, "%b %d %Y - %H:%M %p"),
         across(.cols = sale_date, .fns = c(day = day, mon = month, yr = year,
                                            hr = hour, mnt = minute),
                .names = "sale_{.fn}"),
         retailer = str_extract(status, ".*(?=\\s-)"),
         stock_status = if_else(str_detect(status, "In Stock"), TRUE, FALSE),
         product_type = str_extract(status, "(?<=\\s-\\s).*"),
         product_type = str_extract(product_type, ".*(?=\\sOut|In)"),
         product_type = str_remove_all(product_type, "\\s|:"),
         retail_price = as.double(str_extract(status, "(?<=\\$)\\d*\\.\\d*"))) %>% 
  select(-c(date_time, status))

ps5_stock_data %>% 
  glimpse()
Rows: 600
Columns: 3
$ ...1      <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
$ date_time <chr> "Fri Aug 12 2022 - 5:10 PM EST", "Fri Aug 12 2022 - 4:26 PM …
$ status    <chr> "Amazon - Console Preorder for\n$?", "Amazon - Console Out o…

Filter out pre orders

I’ve decided to use the roll feature of the data.table join methods. This allows me to the in-stock and out-of-stock times that are closest to each other based upon the same retailer, product type and date. It’s not the best way to match the sales but it works for now.

Code
ps5_in_stock <- ps5_stock_data_clean %>%
  filter(sale_date >= "2020-11-11", stock_status == TRUE) %>%
  mutate(in_date = sale_date) %>% 
  select(-stock_status) %>%
  as.data.table()

setkey(ps5_in_stock, retailer, product_type, sale_date)

ps5_out_stock <- ps5_stock_data_clean %>%
  filter(sale_date >= "2020-11-11", stock_status == FALSE) %>%
  mutate(out_date = sale_date) %>%
  select(sale_date, retailer, product_type, out_date) %>%
  as.data.table()

setkey(ps5_out_stock, retailer, product_type, sale_date)

ps5_sales <- ps5_in_stock[ps5_out_stock, roll = TRUE] %>% 
  as_tibble() %>% 
  na.omit() %>% 
  mutate(sale_length = as.integer(out_date - in_date),
         sale_date = as.Date(sale_date))

head(ps5_sales)
# A tibble: 6 × 13
   ...1 sale_date  sale_day sale_mon sale_yr sale_hr sale_mnt retailer product…¹
  <dbl> <date>        <int>    <int>   <int>   <int>    <int> <chr>    <chr>    
1   505 2021-09-02        2        9    2021      11        9 Amazon   ConsoleD…
2    73 2022-06-07        7        6    2022       1        8 Best Buy BundleDi…
3    75 2022-06-07        7        6    2022      12        8 Best Buy BundleDi…
4    75 2022-07-12        7        6    2022      12        8 Best Buy BundleDi…
5    49 2022-07-12       12        7    2022       2        6 Best Buy BundleDi…
6    47 2022-07-12       12        7    2022       2       45 Best Buy BundleDi…
# … with 4 more variables: retail_price <dbl>, in_date <dttm>, out_date <dttm>,
#   sale_length <int>, and abbreviated variable name ¹​product_type
# ℹ Use `colnames()` to see all variable names

Visualizations

I’ve got all of my data cleaned and formatted the way that I want, so what can I learn? The plot below details the sales from each retailer. The y axis measures how long the console took to sell out and the price of console is mapped to the size. From the plot I’m able to see that GameStop primarily sells more expensive bundles and most consoles sell out within 2 hours of going on sale.

Code
ps5_sales %>% 
  arrange(desc(sale_date)) %>% 
  filter(sale_length < 86400) %>% 
  mutate(sale_length_minutes = sale_length / 60) %>% 
  ggplot(aes(sale_date, sale_length_minutes, 
             col = retailer, size = retail_price)) + 
  geom_point(alpha = 0.7) + scale_y_continuous(breaks =  breaks_width(5)) +
  scale_x_date(breaks = breaks_width("3 month"), date_labels = "%b %Y") +
  scale_y_continuous(breaks = breaks_width(60)) +
  scale_color_npg() +
  theme(axis.text.x = element_text(angle = 45)) +
  labs(x = "Sale Date", y = "Sale Length (minutes)",
       col = "Retailer", size = "Retail Price ($)", title = "PS5 Sales by Retailer", subtitle = "Sale Length and Price") +
  theme_minimal(base_size = 14) +
  theme(plot.title = element_text(face = "bold", size = 15),
        plot.subtitle = element_text(size = 12))