A look at Ali Krieger’s passing and defending stats in the NWSL

Yesterday, I did a bit of how-to and analysis in one post for open play passing stats in the NWSL 2016 season – at least for the matches for which we have location data (40 out of 103, as of this writing, to be exact).

Now, I’m going to work with the same dataset in combination with the NWSL 2016 rosters database we’ve got to look at a very statistically interesting type of player – the fullback. We’ll focus one highly productive example – the Washington Spirit’s Ali Krieger, one of the fullbacks in our dataset with the most open play passes attempted per 90 minutes. We’ll look at how their passing and defensive responsibilities change in each third.

The following visualizations and data are all available to interact with and download from this Tableau visualization. All the data can be calculated from our WoSo Stats GitHub repo by following the instructions in yesterday’s post for how to use the R code I’ve created.

The following data is also only for 40 out of 103 NWSL 2016 matches that we’ve logged with complete location data. To see the list of matches this data represents, see the database in the WoSo Stats Github and look for all the matches with “yes” in the “location.complete” column.

Open Play Pass Attempts by Thirds of the Field

Let’s look again at this chart from yesterday- it’s all players with a minimum of 270 minutes logged with location data by the WoSo Stats project, sorted by open play pass attempts per 90, and broken down by what third of the field a player’s passes originated.

screen-shot-2017-02-25-at-11-28-48-am

Takes just a little bit of mental math to notice that out of all the defenders up here, the two with a noticeably higher percentage of pass attempts in the attacking 3rd are O’Hara and Krieger. Those of you who follow the sport will know why – they’re both fullbacks, asked to join the attack much more often than their counterparts at centerback.

Now, using the rosters database, let’s filter out anyone who isn’t a defender.

Sticking with these passing stats one more time before we delve into other stats, I duplicated the chart above, but I filtered out anyone who isn’t a defender. And now the stacked bar charts represent the percentage of all passing attempts that were within each third of the field. In other words, we’re now looking at defenders passing attempts, now broken down by what percentage was in each third. I sorted the following by percentage of passes in the attacking 3rd, so the defenders with the highest percentage of passes in the attacking 3rd are at the top.

 

screen-shot-2017-02-26-at-12-37-56-pm

The top, for those of you familiar with the NWSL 2016 season, is full of fullbacks. Pickett, Reed, Catley, Gilliland, Hinkle, they’re all there. Reed and Pickett, the two Seattle players visible here, are notably the only defenders with more than 25% of their pass attempts in the attacking 3rd. Further down the chart, we run into more defenders who were shuffled between the fullback and centerback role throughout the matches we’ve logged, and then more and more centerbacks.

Back to Krieger. 18.8% of Krieger’s open play pass attempts were in the attacking 3rd, compared to the other fullback on her team visible here, Kleiner, with 15.3%. Krieger’s middle 3rd open play pass attempts were 60.1% of all her attempts, and her defensive 3rd open play pass attempts were the remaining 21%.

There’s a lot of different things we could look at with regards to passing stats. How is Krieger passing the ball out of her defensive 3rd, relative to other defenders? What’s she doing in the midfield with over 60% of her pass attempts? And what’s going on with those 18.8% of passes in the attacking 3rd compared to everyone who’s above her – and below her?

Open Play Passing in the Attacking 3rd

Let’s just look at those attacking 3rd open play passing attempts. Below is the same group of players from above, now charting the percentage of open play pass attempts in the attacking 3rd vs. their open play passing completion percentage in that attacking 3rd.

Out of all the defenders whose percentage of open play pass attempts in the attacking 3rd is over the 75th percentile, Krieger is among two others – Reed and Klingenberg – whose completion percentage hovers around the the 75th percentile for open play passing completion percentage in the attacking third.

Screen Shot 2017-02-26 at 1.42.34 PM.png

Recall from yesterday’s post that the median open play passing completion percentage in this 3rd of the field for all players was 60.6%. Krieger’s is 65.4%.

Defending in the Middle and Defensive 3rd

Now let’s turn to defending stats. I started off with what in the stats table are called “possession disruptions,” – successful tackles and dispossessions of the opponent. That is, instances where a defender was attempting to go 1 on 1 with an opponent and strip the ball away. Below is a chart for all defenders with at least 270 minutes logged with location data, sorted by opponent possessions disrupted per 90 minutes, and broken down by what third of the field they were in.

Screen Shot 2017-02-26 at 2.25.50 PM.png

Krieger doesn’t even show up in this list. She’s down below in the middle of the pack, not even getting more than two opposing possessions disrupted per 90.

screen-shot-2017-02-26-at-2-26-00-pm

But there’s more than one way to defend and her contribution to defense is much more apparent when we look at a different type of defending stats – “ball disruptions.” That is, interceptions, blocks, and clearances of the opponent’s ball – usually a pass attempt. Below is a chart for all defenders with at least 270 minutes logged with location data, sorted the players by ball disruptions per 90 minutes, and broke it down by what third of the field they were in.

Screen Shot 2017-02-26 at 2.32.00 PM.png

 

Krieger is not only up there at #3, but she’s also surrounded by mostly centerbacks.

Now let’s look at those disruptions in the attacking 3rd and middle 3rd, broken down by whether they were interceptions, blocks, or clearances.

screen-shot-2017-02-26-at-2-34-55-pm

screen-shot-2017-02-26-at-2-35-03-pm

Krieger is also out of the top 15 when we look at ball disruptions in the defensive 3rd, but in the middle 3rd she’s ridiculously ahead of every other defender. Even when I included all players, not just defenders, in this visualization, she was still far and away the top of the list.

Screen Shot 2017-02-26 at 2.37.36 PM.png

The number of interceptions per 90 minutes, 4.1, that Krieger gets by themselves are higher than all ball disruptions for some players. I personally think extremely highly of interceptions, as they’re instances when a defending player not only stops the ball but wins clear possession of it – essentially getting credit for a turnover in possession. To get them that high up the pitch compared to every other player might not just make her a good defender, it can also make her a dangerous attacker.

We need your help!

As was noted above, this is only 40 matches out of a 103-game NWSL 2016 season. The WoSo Stats project desperately needs your help to log more basic stats and location data for the 2016 season. The more data we get, the better we’ll understand the sport.

If you’re interested in logging data for matches (that are all publicly available on YouTube), read more here and email me at wosostats.team@gmail.com or send me a DM at @WoSoStats on Twitter. All the data logged with be publicly available on the WoSo Stats Github repo.

 

 

How to break down NWSL passing stats by thirds of the field

In this post, we’re going to look at passing stats by location.

We’ll create two spreadsheets, one with stats for all NWSL 2016 matches that have been logged with location data by the WoSo Stats project, and another with those same stats but broken down by thirds of the field.

I’ll show you the R code used to generate them, and we’ll go over some Tableau visualizations I’ve created to dig into the passing data a little further.

The instructions for how to use the creating-stats.R file are here in the WoSo Stats Github repo. If you’re familiar with R, first things first, source this R file and then run the getStatsInBulk function with the arguments shown below:

your_stats_list <- getStatsInBulk(competition.slug=”nwsl-2016″, location_complete = TRUE)

This will take about a minute. Then run the mergeMatchList function with the following arguments to get the stats table as a data frame named “your_stats”:

your_stats <- mergeStatsList(stats_list = your_stats_list, location = “none”, add_per90 = TRUE)

In there are columns for open play passes, which in the columns are called “opPass.” Open play passes are defined as all passes that aren’t one of the following – namely, dead ball plays:

  • Throw-ins
  • Corner kicks
  • Goal kicks
  • Free kicks
  • Drop kicks or throws by the goalkeeper

The columns we’re going to be primarily concerned with are those named “opPass Att per 90” “opPass Comp Pct,” and it might be useful to also look at “opPass Comp per 90.” When we break these down by thirds of the field further below, they’ll be prefaced with their respective location – so, there will be “A3 Pass Att per 90,” “M3 Pass Att per 90,” and “D3 Pass Att per 90.”

If you don’t know anything about R, don’t worry, you can just follow along with the charts below and ignore all these details about the code and spreadsheets.

The data represented in this post will be available to download from this Tableau visualization. There, you can also interact with the charts shown below.

Another fair warning: the following data only represents 40 matches out of the 103 NWSL 2016 season. They’re all the NWSL 2016 matches in the database with “yes” marked off in the location.complete column. We need more help logging data, and that help could be you!

On to the data, though. What do open play passes look like, without regard for where they came from?

Open Play Passes (without location)

This chart shows open play passing completion percentages, sorted by open play passes attempted per 90. That is, the players at the top attempted the most passes in open play per 90 minutes (take their open play pass attempts, divide it by the number of minutes they played, and multiply that quotient by 90).

screen-shot-2017-02-25-at-10-51-31-am

Here is a table showing the data behind this chart, with an added column for open play passes that were actually completed. “GP” is games played (really, the games that we’ve logged) and “MP” is minutes played.

Screen Shot 2017-02-25 at 10.53.50 AM.png

The top 15 is full of players with generally very high passing completion percentages – all are above the median of 74.9%, except for Fishlock and Krieger.

This chart is stacked with Seattle Reign players, but it’s also stacked with largely defensive-minded players. Corsie, Fletcher, Barnes, Averbuch, O’Hara, Hickmann Alves, and Krieger – nearly half the players are defenders. Defenders usually have higher passing percentages (or at least they should), and they probably see more of the ball than the rest of their teammates, so it shouldn’t be surprising that, since we sorted by open play pass attempts per 90, we got a lot of defenders, and that most of them have pretty good passing completion percentages.

How to look at open play passing stats, then, in a way that accounts for a lot of passing going on in the defensive third. What’s going in with Little? Does her passing completion percentage fall off the top 15 if we could look at her passes in the attacking third? And what about O’Hara, a player who is known to run up and down the field? What does her passing look like in the defensive, middle, and attacking thirds of the field?

To get this data, we have to run some R code again.

Open Play Passes (broken down by thirds of the field)

To get a stats table with all stats broken down by thirds of the field (attacking, middle, and defensive thirds), run this code.

your_stats_list <- getStatsInBulk(competition.slug=”nwsl-2016″, location_complete = TRUE, location = “thirds”)

your_stats <- mergeStatsList(stats_list = your_stats_list, location = “thirds”,add_per90 = TRUE)

You might be sitting there for a few minutes, but the “your_stats” data frame, a 900-column table, will have what we’re looking for.

Now, when we sort by open play passes attempted per 90 and break down passes by thirds of the field for that top 15, it becomes clearer where everything was going on.

screen-shot-2017-02-25-at-11-28-48-am

Fishlock – who, in this dataset, it should be pointed out only has 4 matches logged with location data – is far ahead of the pack when it comes to open play pass attempts, but very few are from her own defensive third. The brunt of her open play pass attempts, as it is for almost everyone seen here, are in the middle of the field, but there is a significant portion of attempts going on in the attacking third.

Another player who had a relatively low open play passing completion percentage was Krieger, and the distribution of her passes is more even. Roughly 60% of her passes were in the middle, and roughly 20% in the defensive and attacking thirds. Her passing completion percentage is probably pretty good in the defensive third, but we’ll soon have look at what it’s like in the middle and attacking third.

And then there’s Little, who had a better open play passing completion percentage by over 20 percentage points than Fishlock, and that’s with a higher percentage of passes in the attacking third (27%, compared to Fishlock’s 24%).

What this chart lacks is passing completion percentages for each third of the field. For that, we can look at a chart, similar to the first one, but for each third of the field.

Open Play Passes in the Defensive 3rd

When looking at open play passing completion percentages in the defensive 3rd, and sorting by how many open play passes were attempted (per 90) out of the defensive 3rd, the chart is exclusively defenders and goalkeepers.

Screen Shot 2017-02-25 at 11.45.50 AM.png

Unsurprisingly, the media open play passing completion percentage, at 81%, in the defensive 3rd is higher than the median for all open play passes. There’s quite a range of passing completion percentages, from over 90% for the likes of Kallman and Fletcher and at or below 70% for Pressley and D’Angelo (a goalkeeper). That’s probably more of a reflection of how they’re trying to get the ball out of their own 3rd – D’Angelo and Pressley are probably launching more speculative long balls into the midfield and attacking 3rd, while Fletcher and Kallman might be passing the ball around in the defensive 3rd much more.

That requires a deeper look at the type of passes out of the defensive 3rd, but we’ll save that for another day. Now, let’s look at this chart, but for passes in the middle 3rd.

Open Play Passes in the Middle 3rd

In the middle 3rd, when looking at open play passing completion percentages in the middle 3rd and sorting by open play passes attempted in the middle 3rd, it’s a different story.

Screen Shot 2017-02-25 at 11.53.48 AM.png

Defenders are all out of the picture now, except for Barnes, and the top 15 is now stacked with midfielders. For those of you who follow the NWSL pretty closely, you’ll also notice these are mostly defensive-minded midfielders. Killion, Brian, Winters, Zerboni, Kyle, and Colaprico are all midfielders generally known to lie deep in the field and support the defense. And it makes sense they’d appear at the top of this list, and generally with such high passing completion percentages, as they’re likely to get the ball a lot, either from the defense, other midfielders passing back, or by winning it from the opposing team.

Little is no longer the #2 player, but she is #1 when looking at passing completion percentage for this top 15. She has an impressive 90.1% passing completion percentage in the middle 3rd with 33.2 open play passes attempted per 90 minutes in that third of the field. Killion is up there, too, with an 85.7% completion percentage in the middle 3rd with 36.5 open play passes attempted per 90 minutes.

Meanwhile, the rest of this top 15 is generally at or above the median of 76.3% for passing completion percentage. Fishlock sticks out for the wrong reason – with the most open play passes attempted per 90 in the middle 3rd (42.7) but with a passing completion percentage of only 65.5%, well below the 25th percentile.

What else could be look at here? There are a lot of passes here. How good are these numbers when we look at passes going forward? How many are being launched forward, or how many are going back to the defense? That’s another analysis for another day, but it’s worth considering if simply looking at pass attempts vs. pass completion percentage is going to hide players who maybe don’t pass the ball a lot out of the midfield and don’t have highest completion percentages – but, maybe they’re more likely to complete a through ball at the expense of a higher passing completion percentage from safer passes, or maybe they’re launching the ball forward and into aerial duels that their teammates are losing but are still creating dangerous loose balls their teams can capitalize on.

The median for passing completion percentage has been dropping the further up we go up the field. It was at 81.3% in the defensive 3rd, 76.3% in the middle 3rd, and now we’re going to see how far it drops in the attacking 3rd.

Open Play Passes in the Attacking 3rd

When we look at open play passing percentages in the attacking 3rd, and sort by open play passes attempted in that third per 90, the percentages are all over the place. There’s also a lot of new names – namely forwards and more attacking-minded midfielders.

screen-shot-2017-02-25-at-12-11-07-pm

The median open play passing completion percentage in this 3rd is low, at 60.6%. That makes sense, as you’re likely not going to have an easy time moving the ball around that close to an opponent’s goal. There’s several players who still stand out, though.

Back to Kim Little, her passing completion percentage out of this third, at 76.5%, is nearly 10 percentage points lower than in the middle 3rd. But compared to the rest of the field, she’s a star, over six percentage points over the 75th percentile.

Perhaps even more impressive is Washington’s Banini, who we haven’t even seen in the top 15 by open play pass attempts per 90 until now. With 14.4 open play pass attempts in this 3rd per 90, she’s getting off a completion percentage of 83.0%. That would be above the 75th percentile even in the middle 3rd!

Fishlock is here, too, although her passing completion percentage is comparable to the rest of the field, unlike in the middle 3rd where she was relatively very low. Relatively low compared to everyone else, though, is Mathis and Leon, who attempt to pass the ball a lot in this 3rd but struggle to get half of them completed.

If we were to break this down further, we’d want to look at how many of these completed passes are staying in the attacking 3rd of if a significant amount of passes out of this 3rd are going back to the midfield. Also, what about crosses, and are those types of high-risk-high-reward passes behind Melis’ and Leon’s low completion percentage? And what about forwards like Alex Morgan and Lynn Williams, who aren’t even in this top 15? Should we even expect them to have high passing attempt numbers, or should a table like this only include fullbacks, midfielders, inside forwards, and exclude players who’s job is to shoot first?

We need your help!

As was noted above, this is only 40 matches out of a 103-game NWSL 2016 season. The WoSo Stats project desperately needs your help to log more basic stats and location data for the 2016 season. The more data we get, the better we’ll understand the sport.

If you’re interested in logging data for matches (that are all publicly available on YouTube), read more here and email me at wosostats.team@gmail.com or send me a DM at @WoSoStats on Twitter. All the data logged with be publicly available on the WoSo Stats Github repo.

It’s been a year!

It’s been a year since this WoSo Stats project went live! To be precise, it’s been a year and two days since this tweet, when I first went public with this project.

A year later, we have over 100 matches in this project’s database, and are 75% complete with logging the NWSL 2016 season.

None of this would have been possible without the incredible, hard work of the dedicated volunteers behind this project. There have been dozens that have helped out, some for one or two matches, and some for far more, but each of them have helped us better understand this beautiful game.

It’s been a humbling experience seeing how eager fans have been to help do something that hasn’t been done before in women’s soccer. I truly believe that the growth of women’s soccer could be one of the next generation’s most interesting, fascinating stories in sports. In the meanwhile, we’ve got an NWSL season to finish, and even bigger hopes for this year.

-Alfredo

How to create “per 90” heat maps

I added a new sheet in to the “heat-maps-template.xlsx” Excel spreadsheet for “per 90” heat maps, which is going to make it way easier to look at how a player performed across different matches without having to scroll through various heat maps. With a few quick fixes, described below, you can also account for how many minutes the players in the database you’re looking at have played.

If you’re not already familiar with heat maps based on WoSo Stats data, read more about how to create your own heat maps here. Following those instructions, so long as you assign “TRUE” to the “per_90” argument in createMultiLocStatsTabs() function, you should have a .csv file in your working directory named “overall-p90-everything.csv” (if you also assigned “everything” to “match_stat”, which is what the following in this blog post will assume).

In the heat-maps-template.xlsx Excel spreadsheet, like with the stats tables in the “match tables” sheet, in the “per 90 table” sheet you’ll need to copy and paste your “overall-p90-everything.csv” table over the large stats table to the right of the heat map. For now, let’s just look at the per 90 stats table that’s already in the template. It’s a stats table for Sky Blue FC’s 2016 matches from Weeks 1 through 5, and Week 7 (Week 6 is incomplete, for now), and it has some differences with the individual match heat maps beyond the number of players it represents.

Look a the cells I’ve highlighted below in orange. You’ll see that the maximum stat open play pass attempts (“opPass.Att”) for an individual is set at 10.09. But over on the right Caroline Casey has 11.4 open play pass attempts in her own 18-yard box (the D18 zone). This is higher than the individual maximum shown below, but that’s by design.

Screen Shot 2016-11-16 at 8.14.35 PM.png

The formula in that “Ind. Max” column only covers all the rows with players that have played more than 270 minutes. Casey only played in two games, so she missed the cut.

Screen Shot 2016-11-16 at 8.19.23 PM.png

This is a quick and inelegant way to account for some way-too-high per 90 stats for players who played very few minutes. This way, a player who only played for 10 minutes and passed the pall three times out of her defensive third’s left wing (the DL zone) doesn’t jack up the individual maximum with her 27 pass attempts per 90 stat.

This is important because the heat map’s color spectrum is determined by the individual maximum. It starts and zero and ends at the maximum, and if outliers weren’t accounted for then the map would look very light for some very good players. Take Sarah Killion, who at 571 minutes is tied with Rampone for the most minutes played out of any player in this set of Sky Blue FC matches. If the individual maximum was being calculated from all players, regardless of minutes, her open play pass attempts per 90 heat map would look very, very light. She has 10.09 open play pass attempts per 90 out of the defensive middle’s center, but it barely stands out because there’s a player with an individual maximum of 33.75 open play pass attempts per 90 that’s throwing everything off!

screen-shot-2016-11-16-at-8-26-11-pm

Go back to setting the individual maximum based on players who played at least 270 minutes (where now the individual maximum for open play pass attempts is 10.09), and Killion’s volume of passing attempts per 90 from the middle of the field stands out way more.

Screen Shot 2016-11-16 at 8.29.30 PM.png

The above example is for a very specific dataset. What if you created a per 90 stats table for every Portland Thorns match with location data, and what if that table had many more rows? And what if, unlike in the example above where there were 11 players who had played at least 270 minutes, there were 13 players and you had the change the number of rows the “Ind. Max” column is looking up?

The solution, for now, until I find a better solution, is a huge pain in the ass but it works. First find, the row number for the last player above the 270 minute threshold. For the example above, it was row 12 – but let’s say it was actually at row 14. Then, highlight the “Ind. Max” column, search for the number adjacent to the “$AW__” value, in this case it’s 12, and replace that with the row number which in our hypothetical scenario would have been 14.

 

Screen Shot 2016-11-16 at 8.55.23 PM.png

That’s the short of the per 90 heat map. I haven’t yet touched the “Team Max” column, but I will in a later post. Coming soon, I will just make one per 90 heat map for the entire season and update it as I get more location data. I will also work on making it easier to copy and paste over stats tables so that you won’t have manually change any formulas ever.

Aerial duels in the 2016 NWSL season (through 54 matches)

In the WoSo Stats Shiny app is a section titled “Aerial Duels” that has data for how many times a player goes up for an aerial duel, and how often she wins them.  In the 2016 NWSL Season Tableau workbook, I originally didn’t include a visualization for aerial duels, but I recently created one to get a better look at how the distribution of players looks when you compare the amount of times they go up for an aerial duel per 90 minutes to the percentage of times they win an aerial duel.

You can view the “Aerial Duels” section of the Tableau visualization for yourself. As of this writing, with 54 matches logged for the season, two players, Dagny Brynjarsdottir and Natasha Kai, stand apart pretty clearly from the rest of the league for how often they are involved in an aerial duel per 90 minutes.

It of course makes sense that they’d have a lot of aerial duels; they’re both tall and are typically thrown into attacking positions high up the field. After Kai (15.6 aerial duels per 90) and Brynjarsdottir (13.6 aerial duels per 90), the rest of the field appears starting with another Portland Thorn, Lindsey Horan (10.1 aerial duels per 90).

screen-shot-2016-11-03-at-6-33-05-pm

The players with the highest aerial duel win percentage with a significant number of aerial duels per 90 (beyond the 25th percentile, the left edge of that light grey rectangle you see running parallel to the y-axis) are further back, with far less aerial duels per 90 but with generally greater defensive duties. The top four – again, with 54 matches logged so far – are Whitney Engen (82% of aerial duels won), Becky Sauerbrunn (78%), Julie King (78%), and Alanna Kennedy (72%).

Sauerbrunn and Kennedy noticeably have a very high win percentage while still being above the 75th percentile of aerial duels per 90. As is evident by looking at the chart, more aerial duels appears to correlate with a winning percentage approaching around 45%.

Finally, I looked at how each team compares. The Western New York Flash stands out for having four players -Erceg, Kennedy, McDonald, and Mewis – clustered in the top-right corner of the chart. No other team has a cluster like that.

Screen Shot 2016-11-03 at 6.54.58 PM.png

Meanwhile, have a look at the Seattle Reign. Their players are generally clustered behind the 75th percentile.

screen-shot-2016-11-03-at-6-58-57-pm

An interesting follow-up to this chart would be to break down the per 90 and win percentage by location. Each match with complete location data has the location of each aerial duel logged, so this is something that should be possible to visualize and analyze once a way of coding through the matches and sorting out the aerial duels by location is resolved.

Another more complex follow-up question is what happens after each of these aerial duels. If it went out of bounds, it was recovered by a teammate of the player who won the aerial duel, if it was cleared away, if the aerial duel resulted in a foul, and so on. This data is deep in the spreadsheet that is logged for each match and I haven’t yet figured out an easy way to do that type of analysis, but it is going to be worth digging into.

In the meanwhile, feel free to dig through the chart and have a look at this for yourself!

 

How to create your own heat maps for NWSL advanced stats

Earlier last week, I published a post exploring heat maps for the Portland Thorns’ April 2016 matches, with a focus on Tobin Heath’s performance. Those heat maps are in an Excel spreadsheet, which you can download here.

In this post, I’ll summarize how they work, and how you can create your own for the matches for which we currently have location data. You’ll need a basic understanding of how to use R and how to modify Excel spreadsheets.

How to create your own heat maps

You will only be able to create heat maps for the matches in the WoSo Stats database for which we have location data. You see which ones they are by going to the database.csv file in the WoSo Stats GitHub repository and seeing which matches have a “yes” in the “location.data” column, which will mean they have complete location data for virtually every action that was logged.

If you’d like to help us get more location data logged for more of these matches and you’ve got a couple of hours to spare, you can help!

Getting the data

Anyways, first things first, open up R or R Studio or whatever you use to work in R, and run this code to source the “getting-data.R” and “create-location-stats-table.R” code. The first file will create a data frame in your working directory for the aforementioned database.csv file and a getMatchCsvFiles() function. The second file will create various functions, but the two we’ll be working with will be createMultiLocStatsTabs() and writeFiles().

source("https://raw.githubusercontent.com/amj2012/wosostats/master/code/version-2/getting-data.R")
source("https://raw.githubusercontent.com/amj2012/wosostats/master/code/version-2/create-location-stats-table.R")

Now it’s time to pick the matches you want with the getMatchCsvFiles function. This function has the following arguments:
1. competition.string: The name of the competition you want to analyze as it is written in the database’s “competition.string” column. This MUST match exactly what is written in the column, and this argument MUST be written. For the NWSL 2016, you’d write in competition.string = “nwsl-2016”. If you’d like to pick from every single match in the database, then just write in competition.string = “database”
2. The data range you’d like to pick, written as one of several possible arguments. You can pick a specific “round” (such as a week in NWSL play), a set of various “rounds” (such as multiple weeks in NWSL play), or a specific month. These arguments are the following:

  • round: The “round” of the competition, written as round = “nameOfRound” For the NWSL 2016 season, “rounds” are weeks of the season; week 1, for example, would be written as round = “week-1”.
  • multi_round: A vector of different “rounds” of a competition (for the NWSL this would be weeks), written as multi_round = c(“X”, “Y”, “Z”). If you wanted weeks 1 through 3, and week 4, you’d write this as multi_round = c(“week-1”, “week-2”, “week-3”, “week-4”).
  • month_year: The month and year of the matches you’d like, written as MM_YY. For example, matches from May 2016 would be written as month_year = “05_2016”.
  • For now, you can only pick one of these at a time. For example, you can only pick April 2016 matches or Week 1 through Week 3 matches, not all matches from Week 1 through Week 3 that happened in April 2016.
  • You can also just leave this argument blank, in which case you’ll pull everything in the database, according to any further filters you set based on the next few arguments.
  1. team: This is optional. This is the abbreviation for the team whose matches are the only ones you want, written as team = “TeamAbbreviation”. The abbreviation is based on our list of abbreviations for club teams and based on FIFA’s country codes. Double-check the database to make sure the team you want is actually in our database – beyond the NWSL 2016 teams, we only have a bunch of international teams and one random PSG-Lyon match (as of this writing).
  2. location_complete: This is also optional, and is set to default as location_complete = FALSE. What that means is that, by default, you will get all matches, regardless of they have completed location data. For the purposes of this blog post, we will want to set this as location_complete = TRUE

Feel free to play around with this (and let me know if you run into any bugs), but here are some examples of how this function works:

To get all Sky Blue 2016 matches for which we have any data:

getMatchCsvFiles("nwsl-2016", team = "SBFC")

To get all Washington Spirit 2016 matches from the month of June, for which we have complete location data:

getMatchCsvFiles("nwsl-2016", month_year = "06_2016", team = "WAS", location_complete = TRUE)

To get all USWNT matches from 2016 SheBelieves cup, for which we have complete location data:

getMatchCsvFiles("shebelieves-cup-2016", team = "USA", location_complete = TRUE)

For this blog post, we’re going to focus on the code I ran to get all Portland Thorns matches from the first 3 weeks of the season. We already know we have location data for these matches, so specifying location_complete isn’t necessary; however, let’s specify it anyways just in case you weren’t sure.

getMatchCsvFiles("nwsl-2016", multi_round = c("week-1", "week-2", "week-3"), team = "PTFC", location_complete = TRUE)

You should now have a match_list list (a very large one, too) with 3 elements, one for each match spreadsheet, and a match_names vector with 3 elements, one for each matchup name.

Getting the location-based data

The next few steps are pretty simple. Call the createMultiLocStatsTabs() function; set the match_list argument as “match_list” and the match_stat argument as the stat you’re looking for (more on this in the next paragraph); and assign it to variable stats_list. This will create for each match a table with each player in one row and their location-based stats in the columns.

When calling this function, one of the arguments is the match_stat, which is the type of location-based stat you want. As of this writing, you can only get 11 different location-based stats, listed below with the string you need to write in as the argument shown in parentheses. If you wanted to get the largest table possible with columns for each stat (this creates a table with 181 columns), just write match_stat = everything

Or, assign one of these to the match_stat argument:
1. Attempted pases (attempted-passes)
2. Completed passes (completed-passes)
3. Passing completion percentage (pass-comp-pct)
4. Take ons won (take-ons-won)
5. Take ons lost (take-ons-lost)
6. Aerial duels won (aerial_duels-won)
7. Aerial duels lost (aerial-duels-lost)
8. Tackles (tackles)
9. Dispossessions of Opp (opp-dispossess)
10. Opp Poss Disrupted (opp-poss-disrupted)
11. Pressure/Challenges (pressure)
12. Recoveries (recoveries)
13. Interceptions (interceptions)
14. Blocks (blocks)
15. Clearances (clearances)
16. Opp Ball Disrupted (opp-ball-disrupted)

For the set of Portland Thorns matches we are working with, this is the code we would run:

createMultiLocStatsTabs(match_list, match_stat = "everything")

Once you run this, you’ll have a list assigned to the variable stats_list that will have a stats table for each of the three Portland Thorns matches.

Then, write these stats tables as .csv files in your working directory, by running the following. Each stats table’s file name will be determined by the match_stat, which you have to specify again (the data won’t be affected, so you could really name this whatever you want) and by the string values in the match_names vector that was created when we ran the getMatchCsvFiles() function .

writeFiles(stats_list, match_names = match_names, match_stat = "everything")

Run this and, staying with our Portland Thorns April 2016 example, your working directory will now have three .csv files.

To review, here is all of the code that was run since the beginning of this blog post to create the three .csv files that are now in your working directory (the code can also be found here:

source("https://raw.githubusercontent.com/amj2012/wosostats/master/code/version-2/getting-data.R")
source("https://raw.githubusercontent.com/amj2012/wosostats/master/code/version-2/create-location-stats-table.R")
getMatchCsvFiles("nwsl-2016", multi_round = c("week-1", "week-2", "week-3"), team = "PTFC", location_complete = TRUE)
createMultiLocStatsTabs(match_list, match_stat = "everything")
writeFiles(stats_list, match_names = match_names, match_stat = "everything")

Create the heat maps

Now the tricky part: creating the heat maps with the data in the .csv files. First, download the Excel template for the heat maps (click on “View Raw” to download), which is just the Portland Thorns April 2016 heat maps, and open it.

Let’s pretend we had this Excel spreadsheet but without the data that’s shown to the right of the heat map, starting with the PTFC-ORL match. Highlight everything in columns “AC” through “HA” from row 1 down to row 29 (as shown in the images below) and clear the contents (DO NOT delete the columns, though).


The heat map will be blank, regardless of what you write into the “Enter name here:” and “Enter stat here:” cells, and the stat info to the right of the heat map and below the cells where you enter the Player and Stat you want will either be zeroes of NAs. This means that the formulas in all those different cells, including the ones that make up the heat map, are looking for data in those columns that we just cleared, but it’s calculating nothing but blanks and errors as there’s nothing there anymore, for now.

Let’s say we didn’t want to re-create the PTFC-ORL match, but instead wanted to use that space we cleared for a BOS-PTFC heat map. Open the “BOS-PTFC-everything.csv” file that you created in your working directory (the following will only work with the “everything” versions of the stats tables) and highlight only the cells that aren’t blank (for this match it’s 27 player rows plus the header row for a total of 29 rows, times the 181 columns, for a total of 5,068 cells you have to highlight). This will look like this in Excel.


Copy those highlighted cells and paste them into the cell at row 1 and column AC, which will fill in the space that was previously taken up by the PTFC-ORL stats. But wait, you’re not done yet! One thing is left to correct, and that’s the team totals.

See that “PTFC” and “ORL” row of numbers in the lower right below the player stats? Those are the total for those stats for each column, which are referred to when creating heat maps for an overall team view. I like to keep the home team on top, so in this example, change “PTFC” in cell AC31 to “BOS” and change “ORL” in cell AC32 to “PTFC”. Then, highlight cells ADH31 (where the totals start) through cell HA31 and search and replace “PTFC” with “BOS”; this changes the formulas in each cell so that they’re now looking for stats from the right team. Do the same for the rows below, searching and replacing “ORL” with “PTFC”. The totals should now be correct.

Finally, in cell Y17 under “Name entered is a team?” is a formula that reads what’s being written into the “Enter name here:” cell and determines, based on the team abbreviations you’ve given it in an OR() formula, if it’s a team that’s been input. Right now this formula is still looking “ORL” as one of the two teams. Change the cell contents from =OR(B5=”PTFC”,B5=”ORL”) to =OR(B5=”PTFC”,B5=”BOS”).

And you’re done! The heat map should work now.

Warnings

  • It’s easiest to create the heat maps with the template I’ve provided if you had created stats tables with match_stat set as “everything.” I added in the option to create smaller stats tables for the future when the “everything” version of a stats table is far, far bigger than 181 columns. For now, though, it makes more sense to work with the “everything” stats tables as 181-column spreadsheets shouldn’t slow down your computer.
  • You can ignore the passing percentage heat maps for the overall team views, as those are the sum of the percentages for each player. I haven’t yet figured out a way to get the average for the percentages that can account for whether a 0.0 passing pct is there because there were no attempts at all.

Next steps

Help us!

Made it this far? Maybe you can help us out a little more. We need help logging this data. This data only happens because of fans like you who have put hours of their free time into logging data onto Excel spreadsheets. But we need more people helping out, as right now we are very low on volunteers and will be lucky to finish the 2016 season by the time the 2017 season even starts! If you’re interested, read more here about how to help and either send a DM on Twitter to @WoSoStats or email me at wosostats.team@gmail.com to get started. All it takes is a couple of hours of your free time, a willingness to learn, and knowing a thing or two about Excel.

Exploring heat maps for Tobin Heath’s NWSL April 2016 Player of the Month performance

Thanks to the hard work of many volunteers and WoSo fans who have contributed their time to the WoSo Stats Project, we have several matches with location data. The way this project tracks matches means that virtually every action (passes, tackles, pressuring an opponent, lost touches, etc.) has its location on the field logged according to this field breakdown.

The possibilities this creates for analyzing play on the field are numerous. However, logging location data on top of logging the match actions themselves is very time-consuming, so every match has its actions logged first and then location data is added after the fact. This means that only a portion of matches in our database have location data logged, but at this point we have enough matches completely logged that we can start to do some interesting things.

In this post I’ll show how you can use Excel spreadsheets to create heat maps for some of the data we’ve logged. This is just the beginning, but some of this is neat enough to show off already. To keep things focused, I also will show heat maps just for Tobin Heath for her April 2016 Portland Thorns matches.

How the heat maps work

The heat maps featured in this post are in an Excel spreadsheet that can be downloaded from the WoSo Stats GitHub repository (click on “View Raw” to download it). A more in-depth post will follow on how these heat maps function and how you can create your own for any match in our database with location data, but for now it’s important to keep in mind what stats these heat maps depict, how the color gradients are scaled, and just in general how to use them without accidentally breaking the whole thing.

In the Excel spreadsheet, all the data that can be represented on the heat map is in the large table to the right. Don’t mess with this! Each column in that large table is for a given stat in a given zone of the field.

The actual stats I’ve chosen for these heat maps are the following, with a description for what each acronym means listed below:

  • opPass.Att = Open play passes (not free kicks, throw-ins, corner kicks, goal kicks) attempted
  • opPass.Comp = Open play passes attempted
  • opPass.Pct = Open play pass completion percentage
  • Int = Interceptions
  • TO.Won = Take ons won
  • TO.Lost = Take ons lost
  • AD.Won = Aerial duels Won
  • AD.Lost = Aerial duels lost
  • Tackles = Tackles that dispossessed an opposing player with the ball
  • Pressure = Instances where a player applied pressure onto an opposing player’s action, without actually attempting a tackle or dispossession
  • Recoveries = Winning possession of a loose ball

More detail on some of the above stats and how they’re logged can be found here.

To actually create a heat map for a specific player from that heat map’s match, type in her name in the cell below “Enter name here.” The heat map will be for the stat in the cell below “Enter stat here:”, which you can also change so long as it matches one of the stats listed below. The stat for that player for each zone will be listed in the grey-shaded cells.

A note on how the color gradient works. For the sake of simplicity, for now, in this Excel spreadsheet the minimum (white) is always 0 and the maximum for each stat (darkest green) is the largest number out of all the zones for all the players. For example, for the Portland Thorns vs. Orlando Pride match above for the opPass.Att stat, Emily Sonnett had the most open play passes attempted from any one zone, with 20 open play passes attempted from the center of the defensive 3rd (DC). So, that’s the maximum for the opPass.Att stat. The same logic holds true for all the other stats.

I’ve got formulas in the Excel spreadsheet to also account for finding the maximum for a team overall, as you can also just type in the team acronym in the cell under the “Enter name here:”

Tobin Heath’s April 2016 NWSL Player of the Month performance

If you’ve already downloaded the Excel spreadsheet, you’ve hopefully noticed you can create heat maps for the three NWSL matches in there for any of the players who played. However, I’m just going to spend the rest of this post looking through the stats for Tobin Heath, as she was voted the Player of the Month by NWSL Media for that month of April. That Tobin Heath is a great player who had a great start to the season, well, that’s not something for which you really need a heat map. But the following will be fun to see how Heath’s passing and take ons were distributed across the field, especially compared to other players.

Tobin Heath in Portland Thorns vs Orlando Pride (Week 1)

The first thing that stood out to me for this match was the distribution of Heath’s passing across a large swath of the field. Her heat map for her open play passing attempts (opPass.Att) is light green, but there’s a lot of light green especially in the opponent’s half.

There weren’t a lot of players who had that much green in the opponent’s half. The only other player who comes close to that many attempted passes was Alex Morgan, shown below.

And Heath’s passing completion percentage was also fairly high in the attacking third compared to other players (here’s the tweet with screenshots of heat maps for Alex Morgan, Kaylyn Kyle, and Mana Shim, who had similarly high pass completion percentages in the attacking 3rd).

Heath’s heat map for her take ons won is noteworthy because, after looking at the heat map for her entire team, it looks like every take on won by Portland in the attacking third (5) belonged to her.

Finally, aside from her passing and playmaking ability, Heath also appears to have applied a good amount of pressure on opponents. Below, her heat map for pressure applied is fairly green up and down her left side of the field.

By comparison, the two players who look like they applied the most pressure in the attacking third were Portland’s Nadia Nadim and Orlando’s Alex Morgan.

Tobin Heath in FC Kansas City vs Portland Thorns (Week 2)

Similar to her previous week, Heath had a very active passive day against FCKC in Week 2 along her left flank, with a high number of passing attempts centered around that attacking left corner.

The only other player with that many passing attempts in the opponent’s left or right corner was her teammate Klingenberg.

Heath’s passing completion percentage was also high across two-thirds of the field, again, especially in the attacking third.

The four other players from this game who appeared to have similarly high passing completion percentage numbers in the attacking third were Erika Tymrak, Allie Long, Mandy Laddish, & Lindsey Horan.

As for take ons won, Heath won a similar amount of take ons in the opponent’s half compared to the previous week.

Take ons lost are worth looking at for Heath here, as it appears she lost a few more take on attempts against FCKC (5) compared to her previous week against ORL (2).

Other players that game with a similar number of take ons won in the attacking third were Meghan Klingenberg, Mandy Laddish, and Allie Long.

As far as recoveries the loose ball goes, Heath did a good amount of it in the attacking third, especially in that dangerous center of the attacking third in front of the 18-yard box.

Two other players whose recoveries in the attacking third looked similar to Heath were Jen Buczkowski and Desiree Scott.

Heath’s pressure applied to opponent’s in their defensive corners also looked impressive, comparable to just a few other players from this game.

Tobin Heath in Boston Breakers vs. Portland Thorns (Week 3)

Finally for this (short) NWSL month, there’s Heath’s performance at Boston where she had another day of passing attempts bunched into the attacking third’s left, similar to the previous week against FCKC.

Kristie Mewis had a heat map similar to this one, with almost as many passing attempts from the attacking third left, and a few more than Heath further down the center of the field.

Heath’s passing completion percentage in the attacking third looks somewhat inverted compared to Mewis. She “only” had a passing percentage of 50% in the attacking third’s left – I put “only” in quotation marks because with more data from more matches it’ll be interesting to see if that low of a passing percentage is actually relatively good compared to what the average player gets. Either way, Mewis had a much better passing percentage, 80%, in that attacking third left.

Compared to the rest of the month, Heath had what looks like significantly less take ons won across the field; 3 compared to 7 against Orlando and 10 against FCKC.

Heath also had the least recoveries of the month against Boston; 6 compared to 10 against Orlando and 14 against FCKC

Final Thoughts & Next Steps

The stats shown here appear to back up the eye test; Heath was a presence, attempting passes across a wide span of the field. She was a threat in the attacking third left, getting off a good number of passes and take ons from that corner.

There’s a lot more that could be done with heat maps, though. I included only a few stats in these heat maps because some stats I didn’t include, such as assists, happen very rarely; I found that these heat maps and the use of a color gradient are most useful and interesting for actions that happen many times such as passes or pressuring an opponent. One idea that I will pursue is creating a heat map that incorporates several matches and, rather than using totals, will use a “per 90 minutes” stat. That is, to use interceptions as an example, instead of showing how many interceptions a player got in one match (which is rarely more than one in any one zone for one match), show how often she gets an interception in that zone per 90 minutes of play.

There’s also the issue of how to account for a scale that is heavily skewed to one extreme, such as the passing completion percentage stat. Most players, even those with an average passing day, will have a passing completion percentage well above 50 percent, so having the minimum below 0 isn’t very useful.

There’s also the matter of how to account for how hard it is to rack up certain stats in some zones compared to others. For example, in the Orlando game, Heath had a fairly evenly distributed number of passes across the field. However, it may not be giving Heath enough credit by having her 5 passing attempts in the defensive middle’s left (DML) be the same shade of green as her 5 passing attempts in the opponent’s 18-yard box. I’m assuming that any one player getting that many passing attempts in the 18-yard box in any one game is rare, so that zone could reasonably be even greener if it were put up against some sort of scale that took into account what the upper and lower quartiles are for that specific zone; I’d bet if we looked at every player from that month and looked the number of passing attempts each one managed to make in the 18-yard box, Heath’s 5 would be somewhere in the top 25%. On the other side of the field, by that standard Heath’s 5 passing attempts in the defensive middle’s left would probably be an even lighter green, since that’s almost always a very active part of the field, especially for left backs, center backs, and defensive midfielders.

Sources

A more in-depth post on how this heat maps works is coming, but for now you can browse the Excel spreadsheets with the raw match actions data for each match here:
* PTFC-ORL (Week 1)
* FCKC-PTFC (Week 2)
* BOS-PTFC (Week 3)

The stats tables next to the heat maps, all by themselves in their own csv files, can be found here:
* PTFC-ORL (Week 1) location-based stats
* FCKC-PTFC (Week 2) location-based stats
* BOS-PTFC (Week 3) location-based stats

And, finally, the R code that created the above csv tables can be found here. That source from this R code that has functions for fine-tuning what matches from our database you’re looking for and from this R code that creates the stats tables.

Help us!

You made it this far down the post! Maybe you can help us out a little more. We need help logging this data. This data only happens because of fans like you who have put hours of their free time into logging data onto Excel spreadsheets. But we need more people helping out. If you’re interested, read more here about how to help, and then send me a DM at Twitter at @WoSoStats or email me at wosostats.team@gmail.com to get started. All it takes is a couple of hours of your free time, a willingness to learn, and knowing a thing or two about Excel.

How to explore USWNT passing stats with heat maps

Over the past several months, in addition to tracking actions such as passes and interceptions, we have also been adding location data to as many USWNT and NWSL 2016 matches as we can. The process for how that works is explained here, but here’s what it ends up looking like on the match’s actions spreadsheet (note the “poss.location” and “def.location” columns) for the USA-Germany SheBelieves Cup match:

Screen Shot 2016-08-02 at 1.00.50 PM

This series of events can be seen at https://streamable.com/xskp

The values in the “poss.location” and “def.location” columns (as well as the “poss.play.destination” column, which are blank here) represent the location of the player from the “possessing” team, based on splitting up the field into different zones as shown here. In the series of event shown above, play is shown moving from Babett Peter in the defensive middle third’s right wing, back to Almuth Schult in the defensive third’s center, and then all the way to the attacking right third where Anja Mittag attempts a side pass that is recovered by Morgan Brian in her own 18-yard box. Also logged is the location of defenders doing certain defensive actions, such as applying pressure onto a pass (as Alex Morgan did) or engaging in an aerial duel with the possessing team (as Crystal Dunn did).

As you can imagine, analyzing something like this, especially over the course of an entire match, is best done in a two-dimensional format. There’s only so many different stats tables you can make before you eventually need to put this on a heat map, like this!

Screen Shot 2016-08-02 at 4.49.25 PM

I created heat maps like the one above for these eight 2016 USWNT matches for which we currently have location data:

  • USA-Ireland (1/23/16 – International Friendly)
  • USA-Costa Rica (2/10/16 – 2016 Olympic CONCACAF Qualifiers)
  • USA-Mexico (2/13/16 – 2016 Olympic CONCACAF Qualifiers)
  • USA-Canada (2/21/16 – 2016 Olympic CONCACAF Qualifiers)
  • USA-England (3/3/16 – 2016 SheBelieves Cup)
  • USA-France (3/6/16 – 2016 SheBelieves Cup)
  • USA-Germany (3/9/16 – 2016 SheBelieves Cup)
  • USA-Colombia (4/6/16 – International Friendly)

The heat maps were created with Excel and can be downloaded here (Click on “View Raw to download).

There’s a heat map for each match in the second sheet of the Excel workbook. Currently, the heat maps only depicts completed passes that were made from within each zone. To change the player the heat map is depicting, just change the name of the player in the cell below where it says “Enter name here”.

8b7c645a462ec06eeca1344c083edaf5

Next to each heat map is a big table of stats and player info, which is where the heat map is getting its data. Don’t change any of this! Unless you really, really know what you’re doing. Make sure the player name you type in for a heat map matches the name of the player in that heat map’s adjacent stats table.

Worst comes to worst and you mess something up, just re-download the Excel workbook from the GitHub repo.

This is still a work in progress that I figured out over the course of a night. Way more than just passes can be put on this heat map, and we also have location data for more than just USWNT matches (we also have NWSL 2016 matches!). For now, though, this works.

If you run into any issues, send me a tweet at @WoSoStats or email me at wosostats.team@gmail.com.

We need volunteers!

If you made it this far, maybe you’re willing to help us log even more data! We are always in need of more volunteers to help us log match actions and location data for women’s soccer matches. Without the help of fans volunteering their time for this project, none of this data is possible. No experience is necessary, just a willingness to learn. Read more about how to help here: https://wosostats.wordpress.com/how-to-help/

Exploring passing stats – USWNT vs. GER (SheBelieves Cup 2016)

As part of our project to track stats for women’s soccer matches (please join and help us get more data!), we’ve been working on adding location data to virtually every action we track. Until now, if you’ve been following some of the stuff I’ve posted on Twitter or the WoSo Stats Shiny app, it’s largely been summary data devoid of location data. That is to say, it adds up aggregates of certain stats (such as total passes attempted by a player or team) or in some cases calculates additional stats based on those basic stats (such as a player’s passing completion percentage), none of which take into account where a player was on the field.

This time, I’m going to look at location-based data. In this post, to make things simple, I’m going to focus one match, the USA-Germany SheBelieves 2016 match. To make things even simpler, I’m also just going to look at passing and possession. This is an early dive into the location data we’re getting from this project, and how it can complement what we already know about a match based on its summary stats and, well, actually watching the game.

Passing Stats

One of the most interesting things I found while exploring the stats this project is generating was the impact of pressure on a player’s passing completion percentage.  I expected, based on intuition, to see a player’s passing completion percentage to go down with pressure, but what I saw was that, on average, it barely had an impact.

Impact of Pressure on opPassing

 

What you’re looking at is the impact that pressure had on a player’s open play passing completion percentage. Open play passes are all passes that aren’t throw ins, free kicks, corner kicks, goal kicks, or goalkeeper throws or dropkicks. I excluded those because those, by definition, can never be “under pressure” by a defender. In the chart above, the further to the right the bar is, the better the player’s open play passing completion percentage got under pressure. To account for differences in open play passing attempts, the darker the green, the more open play passes that player attempted under pressure.

For me, this was a bit of a head-scratcher at first, as I noticed similar numbers across different matches. The median difference is +15%, so it looks like more players’ passing completion percentage actually got better under pressure. I initially chalked this up to, well, these are the two best teams in the world and great players should continue to make good passes under pressure.

However, upon further thought, this does make some sense, which merits further analysis later on. A player under pressure is probably going to be more likely to revert to a “safer” pass, such as a backwards pass, or be forced into a riskier play, such as a take on, due to not having enough space or time to get a pass off. Inversely, a player who isn’t under pressure, with more time and space with the ball, might be more likely to attempt a riskier pass, such as a launched ball, or not even a pass altogether and instead opt for a shot.

It seems pressure might be a better predictor of a player’s passing completion percentage once we are able to break down those decisions a little better, but I’ll save that for another day. What do I want to get at is what happens to these passing stats when we break it down by location.

Adding Location Data

For each pass attempt, we tracked it’s origin (i.e. where the player was passing from) according to which one of the following “zones” on the field she was in.

687474703a2f2f692e696d6775722e636f6d2f45514c6d7059702e706e67

For this analysis, I grouped together passes in the defensive middle third and attacking middle third as passes that generally happened in the middle third. Now, what happens to a player’s open play passing completion percentage when she’s passing from within that all-important attacking third?

impact-of-pressure-on-oppassing-by-location

It drops for pretty much everyone in the match who attempted an open play pass in the attacking third. Again, darker colors indicate more attacking third passing attempts, and the further to the right the bar is the better that player’s passing completion percentage got in the attacking third, compared to her passes in the middle and attacking third.

There are some outliers here. Lloyd, Horan, and Pugh had some very stark differences in completion percentage, but also because they barely attempted any passes from within the attacking third. In general, though, it appears that most players in this match had their passing completion percentage negative affected.

Something interesting worth pointing out is that most of the players in the top half of the chart were German. This stands out even more when we take these two different passing completion percentages (in the attacking 3rd vs. everywhere else) and put them on a dot plot, with a color for each team, as shown below.

opPassing by Location - Dot Plot.png

The further to the right, the higher the player’s open play passing completion percentage in the defensive and middle third. The higher up, the higher the player’s open play passing completion percentage in the attacking third. The size of the dot indicates the number of open play pass attempts in the attacking third, so players who attempted more passes in that part of the field stand out more.

Almost every German player was above the median for open play passing completion percentage in the attacking third. Notably, Marozsan was the only player in the 75th percentile (better than 75% of all players in the match) for both categories. Meanwhile, it looks like Brian’s passing in this match was negatively affected the most when attempting a pass from within the attacking third.

Unfortunately for Germany, despite having better passing completion percentages in the attacking third and applying what appears to have been great pressure on the U.S. defense, they still lost due to an incredible take-on by Alex Morgan in the penalty box that led to an equalizer and an equally incredible error from Almuth Schult, the German goalkeeper, that gave Sam Mewis the game-winner.

Better passing in the attacking third, then, wasn’t enough to get Germany the win, which is really all that ultimately matters in soccer. It’ll be interesting to see, though, as we get more data for more matches, if that’s out of the ordinary. All that pressure on U.S. defense did get the Germans a goal and credit as the only team in 2016 to date to score a goal on the United States. It may not be a guarantee of victory, but I suspect it points most team in the right direction.

Either way, the way the U.S. goals came about is a nice segue into an analysis of take-ons (and what a player does afterwards) and changes in possessions (and where they happen), which I hope to do in the coming week with the USA-Colombia matches.

You can view the stats and visualizations used in this blog post on Tableau and the WoSo Stats Shiny app. All the source  data is freely available on the GitHub repository.

Help!

Okay, if you’ve scrolled this far down then hopefully you’ll be interested enough to help us contribute to our small but growing database of women’s soccer stats. As almost everyone who’s tried to search for something as simple as passing stats for their favorite player knows, there’s a dearth of even the most basic stats for women’s soccer and really women’s sports in general.

Please help us change that, one match at a time! We need people who are willing to volunteer some time and effort (any and all would be appreciate) into logging data for women’s soccer matches. To see which matches immediately need help, check out this month’s goals. To learn how to help and get started, read here. The hope is, for starters, to track every NWSL 2016 match but we still need more people!