Using NHL Data

You'll find the programs linked below. Download and save them to a folder. Don't rename the file. You can run them using command line or terminal. You can also use a code editor (I like Eclipse and jGRASP).

To run them using command line or terminal, first open up the program and navigate to the folder using the cd command. For example, if you saved the file in the /Users/Me/Desktop/NHL/ folder:

cd
cd Desktop
cd NHL

(At least on my computer, terminal starts me off in the "Me" folder by default. Use "ls" if you're unsure about where you are—it will give you a list of things in the current folder. Use "cd .." to navigate one folder up. Also, at least in terminal, folder names are not case-sensitive.)

If this is your first time running the file or first time running after making edits, you need to compile:

javac [program name].java

And now you need to run:

java [program name]

From start to finish, it may look like

cd
cd Desktop
cd NHL
javac GetPbP.java
java GetPbP

with the "javac" line unnecessary if you have not made any edits.

In practice, you'll probably need to open the programs in a text editor or your code editor. I tried to make it as simple as possible to work with the code by either prompting the user for these values when you run, or by putting these values right at the top of the file. (Remember to save and compile again after making changes to the code.)

Hit me up via email (on right sidebar) if you have any questions or suggestions. Especially if you find bugs—these have both gone through a few iterations, but it's always possible I missed something.

Occasionally I may have to make changes to previous programs in order to better implement new ones. In those cases, I'll update the file saved on Dropbox and make a note at the bottom of the page. You can also leave me your email address here.

NHL Play by Play and shift data

This gets you all the regular-season events and shifts since 07-08 (minus the ones which the NHL didn't publish—some logs are missing, and others are incomplete, although this happens for only a handful of games a season out of the 1230). It also creates a schedule file and roster file (listing the players under the team or teams they played for) for each season, and creates both full season play-by-play and shift logs for each team.

Some notes: I divided events into "actor" and recipient since many events have someone doing something, and someone who is on the receiving end (shooting on a goalie, taking the puck away from someone, hitting someone, etc). I also list a block as an event for the shooter, not the person blocking the shot (which should make Corsi calculations a little easier). I list both names and numbers of all players on ice, except in cases where a team has more than six; in those cases (which are rare), I drop any players past the sixth.

On the TOI front, instead of putting shifts in the same format as you'll find them online, it saves a .csv matrix. Each row begins with a number that is the time elapsed in the game so far, in seconds. The next 12 columns are a list of the players on the ice, with a zero if a team is shorthanded. Sometimes there are seven players listed on ice for a given team at once; in this case I omit the final one. (It's a rare scenario.) I also add one second to the listed shift start times, since the shift end times overlap with the shift start times for the next group of players. (So be careful if you're trying to match this with events that result in or from play stoppages.)

Before running the program, create an empty folder for all the play by play and time on ice information. Set the FOLDERPATH variables equal to the folders in which you want the play-by-play and time on ice information stored, although you need not create them since the program will if it doesn't find them (and don't drop the surrounding double quotes by accident). Compile and run.

It will ask you whether you want time on ice information, play-by-play, or both. Then, it will ask for the start season, end season, start game in the first season, and end game in the last season. If you're doing this for the first time, that would be 2007, 2012, 20001, 20720; during the 2013-14 season, you might do 2013, 2013, 20001, 20090 or something (whatever games you don't have which have already been played).

Beware: the shift data will take a massive amount of space on your hard drive (we're talking 15GB or more—even if I didn't create team logs, we're still looking at 5GB. If this is a big problem for you, let me know. The team logs make my life easier.) The play by play will take up a significant amount of space, too, but not as much (500MB on my computer).

A note on updates: when I do fix bugs in GetPbP, you will need to re-run it over all games—that is, starting from 2007 20001.

Calculating competition metrics

This reads the shift data and calculates quality of competition for a team or teams over a specific stretch of games within a season or over multiple seasons. It separates between forward and defenseman quality of competition, and prints the results to either one file or multiple files (which the user can specify; multiple files means one per team).

The file includes player position, player name, team, man-seconds played versus forwards (almost the same as triple the ice time), those forwards' weight (the sum of each forward's value in the user's competition metric of choice—QualComp is merely this divided by man-seconds vs F), man-seconds vs D, and the weight. The reasons I included these columns is because some players switch positions (and are actually listed as having switched positions), and I separate their competition ratings if they switch between forward and defense. If you want to calculate a total, add each of the four columns and divide.

Right now, it can only calculate quality of competition based on 5-on-5 time on ice (per 60 minutes of team 5v5 ice time over the entire season in question, aka TOIComp), but I plan on adding other metrics as options in the future (as well as commenting the code properly, fixing some errors and inefficiencies, adding an option to base it on TOI/60 for a specific stretch of games rather than a full season, etc).

Who was on ice for events

Note: because shift start times overlap with the end times of the previous shift, I add one second to the start times. This means that if something happens the second of a faceoff (like this, perhaps), you'll have to change this script's results by hand.

If you're tracking, say, scoring chances or zone entries, you're probably familiar with this script from Timeonice, which allows you (after selecting a primary team) to input events that happened on the ice with their times, and outputs who was on the ice for each event, along with player on-ice totals at even strength, on the power play, and on the penalty kill.

Since that script hasn't been updated recently, I wrote a program that, given the input of a list in comma-separated value (.csv) format*, will output the same list, but with the numbers of the players on the ice for each event added to the end of each line.

For example, if one line looks like:

20001, 2, 18:34, [...]

This will output

20001, 2, 18:34, [...], [#player1], [#player2]...[#player12]

*You can easily save single spreadsheets to this format in Excel or OpenOffice using "Save As."

As long as you have the game number in the first column, you can use this to get the players on the ice for events in a single game or in multiple games.

Take a look through this code if you want—I took the time to comment through it.

"True Wowy"

See here. Can select team, season range, defenseman, and TOI cutoffs. Make sure to change the filepath at the top to whatever folder on your computer you want the output to be saved in.

Built off GetPbP, so make sure you run that first.

List of updates
May 25, 2014: Fixed a bug in adding scores to TOI files and changed play-by-play season logs attribute specific teams instead of "home" or "road."

March 6, 2014: Fixed a bug in creating schedule files.

February 25, 2014: Fixed a bug in reading only TOI files.

January 11, 2014: Fixed some bugs related to identifying actor/recipient for shots and faceoffs in GetPbP. Also added an auto-update option. 

November 15, 2013: GetPbP now adds the score to each line of the TOI matrices.

August 16, 2013: Corrected for player name misspellings and changes in TOIComp.

August 12, 2013: Added first version of TOIComp program.

August 7, 2013: Fixed some problems in GetPbP. Added team season TOI logs. Added comments.

August 5, 2013: Added GetTOILogs into GetPbP.

July 5, 2013: Updated GetPbP.java

July 10, 2013: Edited GetPbP.java to add score to each line, add "@" to opponent's name if it's a road game (so you don't need to check the schedule to tell), and in each team's season play-by-play log, list team's players last, even on the road.

July 15, 2013: Edited GetPbP.java and TablesSeason.java. Added GetTOIlogs.java.

July 21, 2013: Updated TablesSeason.java and GetTOIlogs.java to avoid shift start/end overlap by adding one second to the start time.

5 comments:

  1. Great website.

    Wish the NHL would at least release game sheet info from the pre-internet era. They must have these archived somewhere. These could be scraped to gather all sorts of interesting info from years/decades past. E.g. primary & secondary assists, who was on the ice for each goal for/against in various game states, shots-G-A-PTS by game state, whose penalties caused his team be the most short-handed, how many Phil Esposito goals were the result of assists by Orr :), etc.

    ReplyDelete
  2. You can find all kinds of NHL data free here:
    https://groups.yahoo.com/neo/groups/hockey_summary_project/info

    Yahoo groups kind suck but this group is full of good files and info.

    ReplyDelete
  3. Is there going to be an update to these scripts so they can be used for the playoffs? When I try to input playoff games, I get an Invalid String error.

    ReplyDelete
  4. Love the site! Where's the link for the dropbox with the program to read shift data from nhl.com?

    ReplyDelete
  5. Playoffs hopefully coming this summer.

    Shift data in GetPbP. Just run autoupdate—it'll take care of all the regular-season games.

    ReplyDelete