Getting started

load_pbp()

As mentioned on the home page, the main function of the hockeyR package is to load raw NHL play-by-play data without having to scrape it and clean it yourself. The load_pbp() function will do that for you. The season argument in load_pbp() is very accepting. You may use any of the following syntax when loading play-by-play data for the 2020-21 NHL season:

  • Full season definitions (ie ‘2020-2021’)
  • Short seaosn definitions (ie ‘2020-21’)
  • Single season definitions (ie 2021)

To load more than one season, wrap your desired years in c(). That is, to get data for the last two years, one could enter load_pbp(c(2020,2021)).

get_game_ids()

If you want to load play-by-play data for a game that isn’t in the data repository, or perhaps you just want a single game and don’t need to load a full season, you’ll first need to find the numeric game ID. The get_game_ids() function can find it for you as long as you supply it with the date of the game in YYY-MM-DD format. The function defaults to the current date as defined by your operating system.

# get single day ids
get_game_ids(day = "2017-10-17")
#> # A tibble: 11 x 8
#>       game_id season_full date       home_name            away_name home_final_score
#>         <int> <chr>       <chr>      <chr>                <chr>                <int>
#>  1 2017020082 20172018    2017-10-17 New York Rangers     Pittsbur~                4
#>  2 2017020083 20172018    2017-10-17 Philadelphia Flyers  Florida ~                5
#>  3 2017020084 20172018    2017-10-17 Washington Capitals  Toronto ~                0
#>  4 2017020081 20172018    2017-10-17 New Jersey Devils    Tampa Ba~                5
#>  5 2017020085 20172018    2017-10-17 Ottawa Senators      Vancouve~                0
#>  6 2017020086 20172018    2017-10-17 Nashville Predators  Colorado~                4
#>  7 2017020087 20172018    2017-10-17 Winnipeg Jets        Columbus~                2
#>  8 2017020088 20172018    2017-10-17 Dallas Stars         Arizona ~                3
#>  9 2017020089 20172018    2017-10-17 Edmonton Oilers      Carolina~                3
#> 10 2017020090 20172018    2017-10-17 Vegas Golden Knights Buffalo ~                5
#> 11 2017020091 20172018    2017-10-17 San Jose Sharks      Montréal~                5
#> # ... with 2 more variables: away_final_score <int>, game_type <chr>

You can instead supply a season to get_game_ids() to grab a full year’s worth of IDs as well as final scores, home and road teams, and game dates for each game in the given season.

scrape_game()

This function scrapes a single game with a supplied game ID, which can be retrieved with get_game_ids(). Live game scraping has yet to undergo testing.

scrape_game(game_id = 2020030175)
#> # A tibble: 718 x 104
#>    event_type event secondary_type event_team event_team_type description period
#>    <chr>      <chr> <chr>          <chr>      <chr>           <chr>        <int>
#>  1 GAME_SCHE~ Game~ <NA>           <NA>       <NA>            Game Sched~      1
#>  2 CHANGE     Chan~ <NA>           Montréal ~ away            ON: Shea W~      1
#>  3 CHANGE     Chan~ Line change    Toronto M~ home            ON: Wayne ~      1
#>  4 FACEOFF    Face~ <NA>           Toronto M~ home            Auston Mat~      1
#>  5 HIT        Hit   <NA>           Toronto M~ home            Zach Hyman~      1
#>  6 CHANGE     Chan~ On the fly     Montréal ~ away            ON: Jeff P~      1
#>  7 CHANGE     Chan~ On the fly     Toronto M~ home            ON: Alex G~      1
#>  8 CHANGE     Chan~ On the fly     Montréal ~ away            ON: Cole C~      1
#>  9 SHOT       Shot  Wrist Shot     Toronto M~ home            Alex Galch~      1
#> 10 CHANGE     Chan~ On the fly     Toronto M~ home            ON: Jake M~      1
#> # ... with 708 more rows, and 97 more variables: period_seconds <dbl>,
#> #   period_seconds_remaining <dbl>, game_seconds <dbl>,
#> #   game_seconds_remaining <dbl>, home_score <dbl>, away_score <dbl>,
#> #   event_player_1_name <chr>, event_player_1_type <chr>,
#> #   event_player_2_name <chr>, event_player_2_type <chr>,
#> #   event_player_3_name <chr>, event_player_3_type <chr>,
#> #   event_goalie_name <chr>, strength_state <glue>, strength_code <chr>, ...

scrape_day()

This is the backbone function that keeps the hockeyR-data repository up to date during the season. Supply a date (YYY-MM-DD) and it will scrape play-by-play data for all games on that day. Live game scraping is still awaiting testing.

scrape_day("2015-01-06")
#> # A tibble: 6,472 x 105
#>    event_type event secondary_type event_team event_team_type description period
#>    <chr>      <chr> <chr>          <chr>      <chr>           <chr>        <int>
#>  1 GAME_SCHE~ Game~ <NA>           <NA>       <NA>            Game Sched~      1
#>  2 CHANGE     Chan~ <NA>           Buffalo S~ away            ON: Josh G~      1
#>  3 CHANGE     Chan~ Line change    New Jerse~ home            ON: Patrik~      1
#>  4 FACEOFF    Face~ <NA>           Buffalo S~ away            Zemgus Gir~      1
#>  5 BLOCKED_S~ Bloc~ <NA>           Buffalo S~ away            Andy Green~      1
#>  6 CHANGE     Chan~ On the fly     Buffalo S~ away            ON: Chris ~      1
#>  7 GIVEAWAY   Give~ <NA>           New Jerse~ home            Giveaway b~      1
#>  8 TAKEAWAY   Take~ <NA>           New Jerse~ home            Takeaway b~      1
#>  9 CHANGE     Chan~ On the fly     New Jerse~ home            ON: Mark F~      1
#> 10 CHANGE     Chan~ On the fly     New Jerse~ home            ON: Jaromi~      1
#> # ... with 6,462 more rows, and 98 more variables: period_seconds <dbl>,
#> #   period_seconds_remaining <dbl>, game_seconds <dbl>,
#> #   game_seconds_remaining <dbl>, home_score <dbl>, away_score <dbl>,
#> #   event_player_1_name <chr>, event_player_1_type <chr>,
#> #   event_player_2_name <chr>, event_player_2_type <chr>,
#> #   event_player_3_name <chr>, event_player_3_type <chr>,
#> #   event_goalie_name <chr>, strength_state <glue>, strength_code <chr>, ...

If you can wait until the day after a game, the load_pbp() function is the only one you’ll need. If you’d like to scrape the data yourself immediately following a game, the other functions discussed here will do the job for you.