How to create “per 90” heat maps

I added a new sheet in to the “heat-maps-template.xlsx” Excel spreadsheet for “per 90” heat maps, which is going to make it way easier to look at how a player performed across different matches without having to scroll through various heat maps. With a few quick fixes, described below, you can also account for how many minutes the players in the database you’re looking at have played.

If you’re not already familiar with heat maps based on WoSo Stats data, read more about how to create your own heat maps here. Following those instructions, so long as you assign “TRUE” to the “per_90” argument in createMultiLocStatsTabs() function, you should have a .csv file in your working directory named “overall-p90-everything.csv” (if you also assigned “everything” to “match_stat”, which is what the following in this blog post will assume).

In the heat-maps-template.xlsx Excel spreadsheet, like with the stats tables in the “match tables” sheet, in the “per 90 table” sheet you’ll need to copy and paste your “overall-p90-everything.csv” table over the large stats table to the right of the heat map. For now, let’s just look at the per 90 stats table that’s already in the template. It’s a stats table for Sky Blue FC’s 2016 matches from Weeks 1 through 5, and Week 7 (Week 6 is incomplete, for now), and it has some differences with the individual match heat maps beyond the number of players it represents.

Look a the cells I’ve highlighted below in orange. You’ll see that the maximum stat open play pass attempts (“opPass.Att”) for an individual is set at 10.09. But over on the right Caroline Casey has 11.4 open play pass attempts in her own 18-yard box (the D18 zone). This is higher than the individual maximum shown below, but that’s by design.

Screen Shot 2016-11-16 at 8.14.35 PM.png

The formula in that “Ind. Max” column only covers all the rows with players that have played more than 270 minutes. Casey only played in two games, so she missed the cut.

Screen Shot 2016-11-16 at 8.19.23 PM.png

This is a quick and inelegant way to account for some way-too-high per 90 stats for players who played very few minutes. This way, a player who only played for 10 minutes and passed the pall three times out of her defensive third’s left wing (the DL zone) doesn’t jack up the individual maximum with her 27 pass attempts per 90 stat.

This is important because the heat map’s color spectrum is determined by the individual maximum. It starts and zero and ends at the maximum, and if outliers weren’t accounted for then the map would look very light for some very good players. Take Sarah Killion, who at 571 minutes is tied with Rampone for the most minutes played out of any player in this set of Sky Blue FC matches. If the individual maximum was being calculated from all players, regardless of minutes, her open play pass attempts per 90 heat map would look very, very light. She has 10.09 open play pass attempts per 90 out of the defensive middle’s center, but it barely stands out because there’s a player with an individual maximum of 33.75 open play pass attempts per 90 that’s throwing everything off!

screen-shot-2016-11-16-at-8-26-11-pm

Go back to setting the individual maximum based on players who played at least 270 minutes (where now the individual maximum for open play pass attempts is 10.09), and Killion’s volume of passing attempts per 90 from the middle of the field stands out way more.

Screen Shot 2016-11-16 at 8.29.30 PM.png

The above example is for a very specific dataset. What if you created a per 90 stats table for every Portland Thorns match with location data, and what if that table had many more rows? And what if, unlike in the example above where there were 11 players who had played at least 270 minutes, there were 13 players and you had the change the number of rows the “Ind. Max” column is looking up?

The solution, for now, until I find a better solution, is a huge pain in the ass but it works. First find, the row number for the last player above the 270 minute threshold. For the example above, it was row 12 – but let’s say it was actually at row 14. Then, highlight the “Ind. Max” column, search for the number adjacent to the “$AW__” value, in this case it’s 12, and replace that with the row number which in our hypothetical scenario would have been 14.

 

Screen Shot 2016-11-16 at 8.55.23 PM.png

That’s the short of the per 90 heat map. I haven’t yet touched the “Team Max” column, but I will in a later post. Coming soon, I will just make one per 90 heat map for the entire season and update it as I get more location data. I will also work on making it easier to copy and paste over stats tables so that you won’t have manually change any formulas ever.

Aerial duels in the 2016 NWSL season (through 54 matches)

In the WoSo Stats Shiny app is a section titled “Aerial Duels” that has data for how many times a player goes up for an aerial duel, and how often she wins them.  In the 2016 NWSL Season Tableau workbook, I originally didn’t include a visualization for aerial duels, but I recently created one to get a better look at how the distribution of players looks when you compare the amount of times they go up for an aerial duel per 90 minutes to the percentage of times they win an aerial duel.

You can view the “Aerial Duels” section of the Tableau visualization for yourself. As of this writing, with 54 matches logged for the season, two players, Dagny Brynjarsdottir and Natasha Kai, stand apart pretty clearly from the rest of the league for how often they are involved in an aerial duel per 90 minutes.

It of course makes sense that they’d have a lot of aerial duels; they’re both tall and are typically thrown into attacking positions high up the field. After Kai (15.6 aerial duels per 90) and Brynjarsdottir (13.6 aerial duels per 90), the rest of the field appears starting with another Portland Thorn, Lindsey Horan (10.1 aerial duels per 90).

screen-shot-2016-11-03-at-6-33-05-pm

The players with the highest aerial duel win percentage with a significant number of aerial duels per 90 (beyond the 25th percentile, the left edge of that light grey rectangle you see running parallel to the y-axis) are further back, with far less aerial duels per 90 but with generally greater defensive duties. The top four – again, with 54 matches logged so far – are Whitney Engen (82% of aerial duels won), Becky Sauerbrunn (78%), Julie King (78%), and Alanna Kennedy (72%).

Sauerbrunn and Kennedy noticeably have a very high win percentage while still being above the 75th percentile of aerial duels per 90. As is evident by looking at the chart, more aerial duels appears to correlate with a winning percentage approaching around 45%.

Finally, I looked at how each team compares. The Western New York Flash stands out for having four players -Erceg, Kennedy, McDonald, and Mewis – clustered in the top-right corner of the chart. No other team has a cluster like that.

Screen Shot 2016-11-03 at 6.54.58 PM.png

Meanwhile, have a look at the Seattle Reign. Their players are generally clustered behind the 75th percentile.

screen-shot-2016-11-03-at-6-58-57-pm

An interesting follow-up to this chart would be to break down the per 90 and win percentage by location. Each match with complete location data has the location of each aerial duel logged, so this is something that should be possible to visualize and analyze once a way of coding through the matches and sorting out the aerial duels by location is resolved.

Another more complex follow-up question is what happens after each of these aerial duels. If it went out of bounds, it was recovered by a teammate of the player who won the aerial duel, if it was cleared away, if the aerial duel resulted in a foul, and so on. This data is deep in the spreadsheet that is logged for each match and I haven’t yet figured out an easy way to do that type of analysis, but it is going to be worth digging into.

In the meanwhile, feel free to dig through the chart and have a look at this for yourself!