Getting started

load_pbp()

As mentioned on the home page, the main function of the hockeyR package is to load raw NHL play-by-play data without having to scrape it and clean it yourself. The load_pbp() function will do that for you. The season argument in load_pbp() is very accepting. You may use any of the following syntax when loading play-by-play data for the 2020-21 NHL season:

  • Full season definitions (ie ‘2020-2021’)
  • Short seaosn definitions (ie ‘2020-21’)
  • Single season definitions (ie 2021)

To load more than one season, wrap your desired years in c(). That is, to get data for the last two years, one could enter load_pbp(c(2020,2021)).

get_game_ids()

If you want to load play-by-play data for a game that isn’t in the data repository, or perhaps you just want a single game and don’t need to load a full season, you’ll first need to find the numeric game ID. The get_game_ids() function can find it for you as long as you supply it with the date of the game in YYY-MM-DD format. The function defaults to the current date as defined by your operating system.

# get single day ids
get_game_ids(day = "2017-10-17")
#> # A tibble: 11 x 9
#>       game_id season_full date   game_~1 home_~2 away_~3 home_~4 away_~5 game_~6
#>         <int> <chr>       <chr>  <chr>   <chr>   <chr>     <int>   <int> <chr>  
#>  1 2017020082 20172018    2017-~ 07:00 ~ New Yo~ Pittsb~       4       5 REG    
#>  2 2017020083 20172018    2017-~ 07:00 ~ Philad~ Florid~       5       1 REG    
#>  3 2017020084 20172018    2017-~ 07:00 ~ Washin~ Toront~       0       2 REG    
#>  4 2017020081 20172018    2017-~ 07:30 ~ New Je~ Tampa ~       5       4 REG    
#>  5 2017020085 20172018    2017-~ 07:30 ~ Ottawa~ Vancou~       0       3 REG    
#>  6 2017020086 20172018    2017-~ 08:00 ~ Nashvi~ Colora~       4       1 REG    
#>  7 2017020087 20172018    2017-~ 08:00 ~ Winnip~ Columb~       2       5 REG    
#>  8 2017020088 20172018    2017-~ 08:30 ~ Dallas~ Arizon~       3       1 REG    
#>  9 2017020089 20172018    2017-~ 09:00 ~ Edmont~ Caroli~       3       5 REG    
#> 10 2017020090 20172018    2017-~ 10:00 ~ Vegas ~ Buffal~       5       4 REG    
#> 11 2017020091 20172018    2017-~ 10:30 ~ San Jo~ Montré~       5       2 REG     
#> # ... with abbreviated variable names 1: game_time, 2: home_name, 3: away_name,
#> #   4: home_final_score, 5: away_final_score, 6: game_type

You can instead supply a season to get_game_ids() to grab a full year’s worth of IDs as well as final scores, home and road teams, and game dates for each game in the given season.

scrape_game()

This function scrapes a single game with a supplied game ID, which can be retrieved with get_game_ids(). Live game scraping has yet to undergo testing.

scrape_game(game_id = 2020030175)
#> # A tibble: 718 x 108
#>         xg event_id event~1 event secon~2 event~3 event~4 descr~5 period perio~6
#>      <dbl>    <dbl> <chr>   <chr> <chr>   <chr>   <chr>   <chr>    <int>   <dbl>
#>  1 NA       2.02e13 GAME_S~ Game~ NA      NA      NA      Game S~      1       0
#>  2 NA       2.02e13 CHANGE  Chan~ NA      Montré~ away    ON: Sh~      1       00
#>  3 NA       2.02e13 CHANGE  Chan~ Line c~ Toront~ home    ON: Wa~      1       0
#>  4 NA       2.02e13 FACEOFF Face~ NA      Toront~ home    Auston~      1       0
#>  5 NA       2.02e13 HIT     Hit   NA      Toront~ home    Zach H~      1      13
#>  6 NA       2.02e13 CHANGE  Chan~ On the~ Montré~ away    ON: Je~      1      244
#>  7 NA       2.02e13 CHANGE  Chan~ On the~ Toront~ home    ON: Al~      1      27
#>  8 NA       2.02e13 CHANGE  Chan~ On the~ Montré~ away    ON: Co~      1      299
#>  9  0.0921  2.02e13 SHOT    Shot  Wrist ~ Toront~ home    Alex G~      1      32
#> 10 NA       2.02e13 CHANGE  Chan~ On the~ Toront~ home    ON: Ja~      1      32
#> # ... with 708 more rows, 98 more variables: period_seconds_remaining <dbl>,
#> #   game_seconds <dbl>, game_seconds_remaining <dbl>, home_score <dbl>,
#> #   away_score <dbl>, event_player_1_name <chr>, event_player_1_type <chr>,
#> #   event_player_2_name <chr>, event_player_2_type <chr>,
#> #   event_player_3_name <chr>, event_player_3_type <chr>,
#> #   event_goalie_name <chr>, strength_state <glue>, strength_code <chr>,
#> #   strength <chr>, game_winning_goal <lgl>, empty_net <lgl>, ...

scrape_day()

This is the backbone function that keeps the hockeyR-data repository up to date during the season. Supply a date (YYY-MM-DD) and it will scrape play-by-play data for all games on that day. Live game scraping is still awaiting testing.

scrape_day("2015-01-06")
#> # A tibble: 6,472 x 109
#>       xg event_id event_t~1 event secon~2 event~3 event~4 descr~5 period perio~6
#>    <dbl>    <dbl> <chr>     <chr> <chr>   <chr>   <chr>   <chr>    <int>   <dbl>
#>  1    NA  2.01e13 GAME_SCH~ Game~ NA      NA      NA      Game S~      1       0
#>  2    NA  2.01e13 CHANGE    Chan~ NA      Buffal~ away    ON: Jo~      1       0
#>  3    NA  2.01e13 CHANGE    Chan~ Line c~ New Je~ home    ON: Pa~      1       0
#>  4    NA  2.01e13 FACEOFF   Face~ NA      Buffal~ away    Zemgus~      1       0
#>  5    NA  2.01e13 BLOCKED_~ Bloc~ NA      Buffal~ away    Andy G~      1      10
#>  6    NA  2.01e13 CHANGE    Chan~ On the~ Buffal~ away    ON: Ch~      1      36
#>  7    NA  2.01e13 GIVEAWAY  Give~ NA      New Je~ home    Giveaw~      1      38
#>  8    NA  2.01e13 TAKEAWAY  Take~ NA      New Je~ home    Takeaw~      1      41
#>  9    NA  2.01e13 CHANGE    Chan~ On the~ New Je~ home    ON: Ma~      1      41
#> 10    NA  2.01e13 CHANGE    Chan~ On the~ New Je~ home    ON: Ja~      1      48
#> # ... with 6,462 more rows, 99 more variables: period_seconds_remaining <dbl>,
#> #   game_seconds <dbl>, game_seconds_remaining <dbl>, home_score <dbl>,
#> #   away_score <dbl>, event_player_1_name <chr>, event_player_1_type <chr>,
#> #   event_player_2_name <chr>, event_player_2_type <chr>,
#> #   event_player_3_name <chr>, event_player_3_type <chr>,
#> #   event_goalie_name <chr>, strength_state <glue>, strength_code <chr>,
#> #   strength <chr>, game_winning_goal <lgl>, empty_net <lgl>, ...

If you can wait until the day after a game, the load_pbp() function is the only one you’ll need. If you’d like to scrape the data yourself immediately following a game, the other functions discussed here will do the job for you.