Monday, November 6, 2017

Lessons learned: Python packaging

So in the last few weeks I've slowly been putting all my code into a Python package on PyPI. (Links: GitHub project, PyPI.)

It's still pretty buggy--hence the pre-alpha designation--and it's not as extensively built-out as some of my older code, but it's under active development and hopefully will pass the old code before long.

I took on this project for two reasons:
  • As a public service for people who want to do everything in Python. (Python is a great general-purpose language to learn, though for any given specific task there's probably a superior option, like R, Julia, or C++.)
  • To teach myself some software engineering-related skills.
One of Python's weaknesses is that the process of writing documentation and making and uploading packages is less-than-clear online.

I can't claim all my tips here are 100% good practice, but they got things to work. Getting things to work was surprisingly frustrating once my project became nontrivial.

Below, I listed some pain points that I learned how to handle. Reminder: you can see the code on GitHub if you want specifics or to see the fixes in context.

Saturday, June 27, 2015

What does BPA mean?

I don't think anybody wants their team to not pick BPA at the draft. No executive, as far as I've seen, says they're not going BPA. But what does BPA mean?

The easy answer is "best player available." But that's not how teams should pick, necessarily.

Look at the NBA Draft and this is pretty clear. The best player in college basketball this year was Frank Kaminsky. He went 9th overall. Eight players (well, nine, probably, to include Justise Winslow), most of whom were also playing in college, were judged to be better pro basketball prospects, even though they are worse players now.

While the NBA Draft has that extra factor of (year-plus) age difference, in hockey, it's the same general idea--best prospect available. Nic Petan was a top player in the CHL but only went in the second round (and while that's a rare case, it's not unprecedented). Anthony DeAngelo was CHL defenseman of the year this past season--in a redraft, I highly doubt he moves up much, even though he's the best D available. What you want is the best prospect available, not the best player available. Both BPA ideas contrast with drafting for need, but the former is better--that should be self-evident. It also should be obvious that they're not the same. (Keep in mind we're still allowing for teams to make errors in assessing BPA.)

The Caps in Ilya Samsonov may have drafted the best player available--AGM Ross Mahoney said that's their philosophy--but given that skaters are about twice as likely as goalies to pan out in the first couple of rounds (going off memory, something like 70% vs 35% to play 100 games), it seems unlikely that he was also the best prospect available.

(Hell, even the history of late first skaters vs the top goalie in each draft doesn't look so hot for the netminders, if you think about it. Bernier, Varlamov, and Price are the last three top goalie picks who really panned out, as far as I can recall, although Vasilevskiy may be getting there.)

It's also possible they thought he was the best prospect available and equate best player with best prospect, but that strikes me as a rookie mistake--the numbers say favor forwards slightly over defenseman, and both heavily over goalies, early on.

I guess the bright side here is that if he develops rapidly, the Caps could be saving $5m or more by moving Holtby and going with Samsonov and Grubauer in the final years of Backstrom's deal (which has five years remaining). In a vacuum, though, I find it hard to believe any team is so good at drafting goalies that picking one in the first round makes sense, and projecting your situation a few years down the road is always tricky business. I'm not a fan of this pick.

I hope I'm wrong.

Tuesday, October 8, 2013

Good Teams, Bad Games

I took the teams with 54+% Corsi close at the end of each season and looked at their Corsi percentages over each three-game stretch (overlaps included).


According to Extra Skater, the Capitals are at 46.8% over their first three games of the 2013-14 season. A little over four percent of three-game stretches here fall in that range (less than and not including 47%).

Caps under 46.8%: Fehr, Brouwer, Chimera, Alzner, Hillen, Ward, Green, Carrick, Oleksy, Beagle. Caps over 50%: Erskine, Ovechkin, Erat, Backstrom, Wilson, Latta, Carlson, Johansson.

I'm not sure what to make of this. On the one hand, it's probably a good bet that the Capitals are not an elite possession team. (I thought they'd be around 52-53%...eight percent of three-game segments for 52+% Corsi close teams are under 47%. The figure is four-and-a-half percent for 53% Corsi close teams. Still not good odds, but there's a little more hope on the 52% front.) On the other hand, four percent of the time is still three or four times a season (80 three-game segments in 82 games).

I still think this team is a 52% possession team that'll be on the good side of PDO (assuming Erat moves up the lineup and Laich down). But, obviously, this slow start is less than encouraging, especially given that last season wasn't that great possession-wise for Washington.

The 17 teams above 54% Corsi close: 07-08 Red Wings, 07-08 Rangers, 07-08 Capitals, 08-09 Sharks, 08-09 Red Wings, 08-09 Blackhawks, 08-09 Capitals, 08-09 Flames, 09-10 Red Wings, 09-10 Blackhawks, 11-12 Red Wings, 11-12 Penguins, 11-12 Devils, 11-12 Kings, 12-13 Devils, 12-13 Blackhawks, 12-13 Bruins, 12-13 Kings.

Monday, September 30, 2013

Metropolitan Division Thoughts

1. Pittsburgh*

What conventional wisdom gets right: Crosby, Malkin, and Letang can carry this team.

What conventional wisdom gets wrong: This team isn't that deep anymore. If Crosby and Malkin go down, they're done—no Jordan Staal-level player to pick up the slack. Their goaltending also looks merely average or slightly below—Fleury hasn't been bad during the regular season in awhile now. I highly doubt the situation in net can derail this team, unless Fleury and Vokoun are simultaneously out for a long period of time.

Recent developments: Kris Letang got hurt. Out indefinitely. Tomas Vokoun was also hospitalized for a blood clot and although he's been discharged, he's out indefinitely as well.

Saturday, September 28, 2013

Atlantic Division Thoughts

1. Boston*

What conventional wisdom gets right: The Bruins are deep up front and have great goaltending.

What conventional wisdom gets wrong: Their defense isn't proven to be that good—I think their success has been mostly forward- and goalie-driven, Chara aside, of course—although with young players like Dougie Hamilton and Torey Krug it could be. Jarome Iginla and Loui Eriksson may not be an upgrade over Nathan Horton and Tyler Seguin. But given how good Tuukka Rask is, I'll give Boston the edge over the rest of the teams in this division.

Recent developments: none

Friday, September 27, 2013

Central Division Thoughts

1. Chicago*

Losses: F Michal Frolik, F Dave Bolland, F Viktor Stalberg, G Ray Emery
Additions: G Nikolai Khabibulin

Forwards: Still stacked (although perhaps not quite as much as last season). Jonathan Toews, Patrick Kane, Patrick Sharp, and Marian Hossa make for a strong top-six, and younger players like Marcus Kruger, Brandon Saad, Jeremy Morin, Jimmy Hayes, and perhaps Brandon Pirri add talent across the rest of the lines. The only real weakness is having Michal Handzus—if last season is any indication, he'll be a 2C, and he's a bad 2C. (It's just that Sharp and Kane/Hossa are more than good enough to carry him.) Bolland and Stalberg should be easy to replace; Frolik will be a little tougher to replace, but there's so much talent in the system that it's hard not to see a couple of players stepping up in the next couple of seasons.

Defense: Duncan Keith and Brent Seabrook make a terrific top pair, Niklas Hjalmarsson and Johnny Oduya make for a surprisingly solid second pair, and developing Nick Leddy and veteran Michal Roszival make a good third pair. (If you haven't noticed, there's a common theme here: "good".)

Goaltending: We can be pretty sure Corey Crawford is a competent starter. Probably not much more or much less.

tl;dr : Pretty much the same team that won the Cup. Not way better than anyone else (like 2012 Los Angeles), but one of the two or three best teams around for sure.

Thursday, September 26, 2013

Testing my optimism

I thought this would be a fun exercise. I'll project every player's goal, assist, and point total. At the end, I'll add them up and see whether the totals make sense.

Ovechkin—40-50-90
Backstrom—25-55-80
Johansson—17-35-52
Erat—17-35-52
Grabovski—25-25-50
Brouwer—17-15-32
Laich—15-20-35
Perreault—15-25-40
Fehr—15-10-25
Chimera—10-10-20
Beagle—5-5-10
Ward—10-15-25

And throw in a little extra from Volpatti, Latta, and the other irregulars.

Green—15-40-55
Alzner—2-15-17
Carlson—10-30-40
Erskine—2-3-5
Hillen—2-8-10
Oleksy—1-9-10

And throw in a little extra for Strachan, Orlov, Kundratek, and other irregulars.

Holtby—0-2-2

Total: 243+ G, 407+ A, 650+ P

Let's say the irregulars bring that up to 245 goals. That's 2.99 goals per game—pretty reasonable, actually. In 2011-12, it would have ranked fourth in the league, and in 2010-11, fifth. (The Caps were fifth in the lockout-shortened season.)

Yeah, so I'll stick with these.

Saturday, September 14, 2013

Pacific Division Thoughts

Lots of interesting storylines in this division, from potentially the last hoo-hah of Thornton, Marleau, and Boyle in San Jose, to seeing if Phoenix can rebound with Mike Ribeiro in the mix, Edmonton's experiment with saying they'll do things their fanbase loves (puck possession), seeing if Anaheim can become that decent puck possession team we've seen glimpses of each of the last two seasons, and, of course, John Tortorella versus Canadian media.

I vacillated a lot between Anaheim, Phoenix, and Edmonton—I can see a good case for any order there. I felt there was a lot of uncertainty with Vancouver. Other than that, I feel pretty good with my picks here...unless I missed something huge.

Saturday, September 7, 2013

The Magical Metro Effect

A lot of people have discussed about how the Capitals will struggle in the new Metropolitan Division, being a borderline playoff team at best. Their arguments basically are derived from this:

Caps vs Southeast, last two seasons: 27-11-4 (a 113pt pace)
Caps vs Metro, last two seasons: 19-20-7 (an 80pt pace)

I have several issues with that reasoning.

Thursday, September 5, 2013

AO/Backstrom WOWY

In the vein of what MC79 did here:


It's probably worth noting that the decline in 2011-12 was in large part thanks to Backstrom's injury—it led to Ovechkin being centered by players like Brooks Laich and Marcus Johansson instead. I'm guessing Green's injuries had to do with the downturn in 10-11 and 11-12, too.

Regardless, not a pretty picture.

Wednesday, August 21, 2013

TOIComp Effects, Part 1

When I ran the numbers for TOIComp, I was a little disappointed—but not surprised—that the spread between top and bottom competition (for valuable regulars) was so small. After all, we observe the same thing for Corsi Rel QoC, and differences at the season level in that metric don't appear to appreciably alter players' results. I felt that a similar study on TOIComp would be more work for the same result, but I'll do it anyway (after prompting from @garik16 and @pcunneen19).

Hopefully, there's something interesting in the data this time.

Monday, August 12, 2013

2013 TOIComp leaders

Here are the top 30 skaters by TOIComp. I only included the 609 skaters who played at least 36,000 man-minutes versus forwards—at three forwards per second, this is about 200 5v5 minutes. You can find the complete lists here, and I posted a preliminary version of the code here.
Top 30 vs F:


Rank
Player
F TOIComp
1
Phaneuf
16.58
2
Chara
16.47

Girardi
16.47
4
Z. Michalek
16.41
5
Ladd
16.40
6
Marleau
16.39
7
Zetterberg
16.38
8
B. Gionta
16.37
9
Little
16.36
10
Kulemin
16.35

Plekanec
16.35
12
Wheeler
16.33

Brouwer
16.33
14
Kopecky
16.30
15
Ekman-Larsson
16.29

Dupuis
16.29

Fleischmann
16.29
18
Goc
16.28
19
Bogosian
16.27

Bouwmeester
16.27
21
Pominville
16.26
22
Stepan
16.25

Orpik
16.25
24
Carlson
16.24

Giroux
16.24
26
Backstrom
16.23

Weiss
16.23
28
McDonagh
16.22
29
Datsyuk
16.21

Couture
16.21

The entire spread here is about 20 seconds—not exactly a lot. I'm a little more impressed by the offensive options here, since those guys will get ice time against the low-TOI defensive players when their team is down by a goal, meaning that in their tough minutes, they played an elevated TOIComp.

If you're surprised to see Backstrom here, given my post the other day, note that he still ranks behind Brouwer and Carlson, and he would have ranked behind Brooks Laich, but Laich didn't make the TOI cutoff.

And vs D:


Rank
Player
D TOIComp
1
Giroux
21.10
2
Pominville
21.08
3
St. Louis
21.04

Stamkos
21.04
5
Vanek
21.01
6
Hodgson
20.99
7
Crosby
20.94
8
H. Sedin
20.90

D. Sedin
20.90

Voracek
20.90
11
Bozak
20.89
12
Ovechkin
20.88

Moulson
20.88
14
Kessel
20.85

Malkin
20.85
16
Tavares
20.84
17
E. Staal
20.83

Van Riemsdyk
20.83
19
Hartnell
20.82

Benn
20.82

Perry
20.82
22
Kunitz
20.80
23
Boyes
20.79
24
Duchene
20.77
25
Conacher
20.76

Semin
20.76

Toews
20.76

Burrows
20.76
29
Timonen
20.75

Tlusty
20.75

Saad
20.75

Hossa
20.75

Lots of linemates here—no surprises. The spread is again only about 20 seconds from 1 to 30. A little interesting to see both Crosby and Malkin (I'm guessing that's a result of Malkin taking top opposition with Crosby out).

Friday, August 9, 2013

Caps TOIComp 07-10

In case you missed it, yesterday I broke down TOIComp for the three recent Caps coaches (Boudreau, Hunter, and Oates) to see how they deployed their players. To follow up, I want to look at Glen Hanlon's team in 07-08, followed by Boudreau's habits during the Ovechkin-led Caps' peak.

Thursday, August 8, 2013

Caps TOIComp

It took awhile, but I finally got this done. (I'm still tweaking the code, so it'll be a few days before I publish it here.)

A few notes:

a) I calculated TOI/60 by dividing individual 5v5 TOI by team 5v5 TOI (only counting games in which the player played). This differs from Behind the Net's TOI/60, which is 5v5 (and 6v5) TOI per game.
b) The bubble size is total 5v5 TOI this season (well, almost—it's actually the amount of man-seconds played against forwards. Let's just say that this only differs from time on ice when a team does not put three forwards on the ice).
c) Small sample caveats apply to the smaller bubbles, and remember context and line changes and such will play a role.

Red is forwards, blue are D. Hover over a bubble for additional information.

Monday, July 29, 2013

Ideas for a touches app

With all the coding I've been doing recently (both for my NHL code and for real-life projects) I got to thinking about maybe writing a puck-tracking app. I've written something similar for tracking shot locations for basketball, and I thought the only changes I'd have to make would be to track the location of the mouse, not just clicks, and somehow incorporate a clock as well as figure out a way to separate carries from passes and dumps as well as changes of possession. One person could chart the movement of the puck, and someone else, watching later, could match that information up to which players handled the puck. Not only do we quickly consolidate the scoring chance, zone entry, and zone exit projects (the latter two of which are looking for help for next season), but we add a ton of data on top of that.

Then, I remembered this article from Arctic Ice Hockey. Tracing the path of a puck on a tablet is much faster and more accurate, I'm guessing, than tracing the path on a computer screen. One person could track the movement, and one person the players, as before. Since so many people have iPads, I thought about writing an app, but then I saw it costs $99 to get a developer's license for a year—more than I want to spend for just one app. But it's free (I think) to put up Android apps, a lot of people have Android tablets or phones, and thanks to this tool I discovered from MIT, it might be surprisingly straightforward to get this thing up and tested relatively quickly.

Before I start, I'd like some feedback.

a) Do you think it's feasible to track touches for entire games? (It would basically need to be done without looking at the tablet/phone at all.)

b) How would you design the app?

Obviously, I think the answer to (a) is "yes." To (b), here's my vision: