load_pbp()
As mentioned on the home page, the main
function of the hockeyR
package is to load raw NHL
play-by-play data without having to scrape it and clean it yourself. The
load_pbp()
function will do that for you. The
season
argument in load_pbp()
is very
accepting. You may use any of the following syntax when loading
play-by-play data for the 2020-21 NHL season:
To load more than one season, wrap your desired years in
c()
. That is, to get data for the last two years, one could
enter load_pbp(c(2020,2021))
.
get_game_ids()
If you want to load play-by-play data for a game that isn’t in the data repository,
or perhaps you just want a single game and don’t need to load a full
season, you’ll first need to find the numeric game ID. The
get_game_ids()
function can find it for you as long as you
supply it with the date of the game in YYY-MM-DD
format.
The function defaults to the current date as defined by your operating
system.
# get single day ids
get_game_ids(day = "2017-10-17")
#> # A tibble: 11 x 9
#> game_id season_full date game_~1 home_~2 away_~3 home_~4 away_~5 game_~6
#> <int> <chr> <chr> <chr> <chr> <chr> <int> <int> <chr>
#> 1 2017020082 20172018 2017-~ 07:00 ~ New Yo~ Pittsb~ 4 5 REG
#> 2 2017020083 20172018 2017-~ 07:00 ~ Philad~ Florid~ 5 1 REG
#> 3 2017020084 20172018 2017-~ 07:00 ~ Washin~ Toront~ 0 2 REG
#> 4 2017020081 20172018 2017-~ 07:30 ~ New Je~ Tampa ~ 5 4 REG
#> 5 2017020085 20172018 2017-~ 07:30 ~ Ottawa~ Vancou~ 0 3 REG
#> 6 2017020086 20172018 2017-~ 08:00 ~ Nashvi~ Colora~ 4 1 REG
#> 7 2017020087 20172018 2017-~ 08:00 ~ Winnip~ Columb~ 2 5 REG
#> 8 2017020088 20172018 2017-~ 08:30 ~ Dallas~ Arizon~ 3 1 REG
#> 9 2017020089 20172018 2017-~ 09:00 ~ Edmont~ Caroli~ 3 5 REG
#> 10 2017020090 20172018 2017-~ 10:00 ~ Vegas ~ Buffal~ 5 4 REG
#> 11 2017020091 20172018 2017-~ 10:30 ~ San Jo~ Montré~ 5 2 REG
#> # ... with abbreviated variable names 1: game_time, 2: home_name, 3: away_name,
#> # 4: home_final_score, 5: away_final_score, 6: game_type
You can instead supply a season to get_game_ids()
to
grab a full year’s worth of IDs as well as final scores, home and road
teams, and game dates for each game in the given season.
scrape_game()
This function scrapes a single game with a supplied game ID, which
can be retrieved with get_game_ids()
. Live game scraping
has yet to undergo testing.
scrape_game(game_id = 2020030175)
#> # A tibble: 718 x 108
#> xg event_id event~1 event secon~2 event~3 event~4 descr~5 period perio~6
#> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <int> <dbl>
#> 1 NA 2.02e13 GAME_S~ Game~ NA NA NA Game S~ 1 0
#> 2 NA 2.02e13 CHANGE Chan~ NA Montré~ away ON: Sh~ 1 00
#> 3 NA 2.02e13 CHANGE Chan~ Line c~ Toront~ home ON: Wa~ 1 0
#> 4 NA 2.02e13 FACEOFF Face~ NA Toront~ home Auston~ 1 0
#> 5 NA 2.02e13 HIT Hit NA Toront~ home Zach H~ 1 13
#> 6 NA 2.02e13 CHANGE Chan~ On the~ Montré~ away ON: Je~ 1 244
#> 7 NA 2.02e13 CHANGE Chan~ On the~ Toront~ home ON: Al~ 1 27
#> 8 NA 2.02e13 CHANGE Chan~ On the~ Montré~ away ON: Co~ 1 299
#> 9 0.0921 2.02e13 SHOT Shot Wrist ~ Toront~ home Alex G~ 1 32
#> 10 NA 2.02e13 CHANGE Chan~ On the~ Toront~ home ON: Ja~ 1 32
#> # ... with 708 more rows, 98 more variables: period_seconds_remaining <dbl>,
#> # game_seconds <dbl>, game_seconds_remaining <dbl>, home_score <dbl>,
#> # away_score <dbl>, event_player_1_name <chr>, event_player_1_type <chr>,
#> # event_player_2_name <chr>, event_player_2_type <chr>,
#> # event_player_3_name <chr>, event_player_3_type <chr>,
#> # event_goalie_name <chr>, strength_state <glue>, strength_code <chr>,
#> # strength <chr>, game_winning_goal <lgl>, empty_net <lgl>, ...
scrape_day()
This is the backbone function that keeps the hockeyR-data
repository up to date during the season. Supply a date
(YYY-MM-DD
) and it will scrape play-by-play data for all
games on that day. Live game scraping is still awaiting testing.
scrape_day("2015-01-06")
#> # A tibble: 6,472 x 109
#> xg event_id event_t~1 event secon~2 event~3 event~4 descr~5 period perio~6
#> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <int> <dbl>
#> 1 NA 2.01e13 GAME_SCH~ Game~ NA NA NA Game S~ 1 0
#> 2 NA 2.01e13 CHANGE Chan~ NA Buffal~ away ON: Jo~ 1 0
#> 3 NA 2.01e13 CHANGE Chan~ Line c~ New Je~ home ON: Pa~ 1 0
#> 4 NA 2.01e13 FACEOFF Face~ NA Buffal~ away Zemgus~ 1 0
#> 5 NA 2.01e13 BLOCKED_~ Bloc~ NA Buffal~ away Andy G~ 1 10
#> 6 NA 2.01e13 CHANGE Chan~ On the~ Buffal~ away ON: Ch~ 1 36
#> 7 NA 2.01e13 GIVEAWAY Give~ NA New Je~ home Giveaw~ 1 38
#> 8 NA 2.01e13 TAKEAWAY Take~ NA New Je~ home Takeaw~ 1 41
#> 9 NA 2.01e13 CHANGE Chan~ On the~ New Je~ home ON: Ma~ 1 41
#> 10 NA 2.01e13 CHANGE Chan~ On the~ New Je~ home ON: Ja~ 1 48
#> # ... with 6,462 more rows, 99 more variables: period_seconds_remaining <dbl>,
#> # game_seconds <dbl>, game_seconds_remaining <dbl>, home_score <dbl>,
#> # away_score <dbl>, event_player_1_name <chr>, event_player_1_type <chr>,
#> # event_player_2_name <chr>, event_player_2_type <chr>,
#> # event_player_3_name <chr>, event_player_3_type <chr>,
#> # event_goalie_name <chr>, strength_state <glue>, strength_code <chr>,
#> # strength <chr>, game_winning_goal <lgl>, empty_net <lgl>, ...
If you can wait until the day after a game, the
load_pbp()
function is the only one you’ll need. If you’d
like to scrape the data yourself immediately following a game, the other
functions discussed here will do the job for you.