Wednesday 3 September 2014

Winning an ODI from the front

Is Cook the captain to correct England's ODI ship?
There have been a lot of debates recently about the role of the top order in an ODI. Is it more effective to come out swinging or is a more cautious approach more appropriate?

Until 1992 the expectation was that the top order's job in an ODI was to see off the new ball, and scoring at 3 an over was fine. If you look at the great opening bowlers of the 80's you will see that most of them had economy rates near 3.5 rpo. Then in the 1992 World Cup something wonderful happened. In the 10th match, New Zealand were playing South Africa, and one of New Zealand's premier batsmen, John Wright, got injured. In came Mark Greatbatch. Rather than playing the traditional openers role, he took advantage of the fielding restrictions and scored 68 off 60, including hitting Allan Donald back over his head for a 6 that landed on the roof of the stand. From this point onwards, New Zealand's approach changed, and the first 15 overs were seen as the best time to score quick runs.

In the 1995/6 Australian tri-series, Sri Lanka took that tactic to the next level. Kaluwitharana and Jayasuriya batted like whirlwinds and not long later they helped take their team to victory in the World Cup. At this point the world really stood up and took notice. The dashing opener was now the in thing. New Zealand had Astle, Australia had Gilchrist, India used Sehwag, Pakistan opened for a while with Afridi. An attacking opener had become as much a part of the game as using a spin bowler in the "boring middle overs."

But recent changes to the game, such as the different fielding restrictions at the end of the game and 2 new balls have meant that some people have questioned if going for it at the outset is such a good tactic now. Is it a better idea to keep wickets hand and "go harder, later."

The majority of this discussion has come out of England, where the roles of captain Cook, Bell and (before he dropped out of the game) Trott have been coming under increasing scrutiny. Are they batting too slow? Are they putting too much pressure on the players coming in after them?

This led me to have a look at the role of the top 3, in the past 2 years. What I wanted to know was what had the biggest impact on a team winning, the openers batting a long time, them scoring a lot of runs or them scoring at a quick run rate.

The first thing that I did was I got a list of the scores when the second wicket fell in each match in the past 2 years, and the outcomes of the matches.

The next step was to graph it, and see what came out.

First I looked to see if there was a different relationship between the overs taken and runs scored for the first 2 wickets for teams that won and teams that lost.

We can see that there is a fairly strong relationship with both graphs.

I set the intercept of the trend lines to 0, so that the gradients are effectively the run rates.

There are a few noticeable differences.  Firstly the teams that win tend to score at a higher run rate for the first 2 wickets than the teams that lose. However when I overlapped the two graphs, this difference was not as striking visually as it is numerically. (After 40 overs the trend lines are actually only 28 runs apart)

More significantly, there's a lot more data above 15 overs on the winning graph than on the losing graph. There is also a lot more instances on the winning graph where the first two wickets have contributed 150 or more runs.

I ran a quick bootstrap analysis and found that there was a statistically significant difference between both the median number of runs scored at the fall of the second wicket by the winning teams and the number of overs that they batted for.

To highlight that difference, I drew a cumulative frequency graph. The difference in the distribution of the number of overs faced by the winning and losing teams is quite striking.

That gave me some reason to search further.  It seems that there is a statistical evidence to say that there is a difference between the performances of the top 3 batsmen of teams that win and lose.

The next question was to see which made more of an impact: the number of runs, the number of overs or the rate that the runs were scored at.

To do this I ordered the innings by each of these three variables, and then looked at how many of the 9 innings surrounding each one were won and lost. This is not particularly intuitive, but it did give me some idea about the impact an increase in the variables would have on the likelihood of winning.

First I looked at run rate:

There is a trend here, but it's certainly not a strong relationship.

It's clear that increasing the run rate that the first two partnerships score at can contribute to increasing the likelihood of winning.

The important number here is the R² value of 0.35914. The closer to one this value is, the more linear the relationship is. While it is not a perfect measure of how strong a relationship is, it is a good indication.

The next graph that I looked at was the total runs scored.

This is a similar relationship, but it is clearly stronger.

The points are generally closer to the trend line and the R² value is higher (0.44074).

The data (expectedly) thins out as the number of runs increases, as it's quite rare for teams to get to 200 for the loss of only one wicket.

The R² value is less than 0.5, so there's still more of the variation that is unexplained than explained by this relationship, but again that is to be expected, as there are a lot more factors in a game than the first 3 batsmen, and it is always possible for a game to change suddenly.

The total runs scored for the first two partnerships seems so far to be a better predictor of success than the rate that they scored at.

The third factor is the one that I found the most interesting.

The relationship between the overs batted and wins looks (unsurprisingly) quite similar to the relationship between runs and wins.

The R² value for overs is lower than the corresponding value for runs, but both are higher than the relationship with run rates.

The point where the teams are winning more than they are losing are roughly 14 overs, 75 runs and 4.7 rpo respectively. These numbers start to give us an idea about what we should be looking for in an opener.

However, I wasn't totally convinced by these graphs.  I wondered if they would have turned out the same if I had chosen to look at 15 innings or 5 innings, or some other slightly different way of looking at it.  So I decided to try looking at the winning probability for individual points.  To do this I rounded the run rates to the nearest 0.1 rpo, the overs faced to the nearest over and the runs to the nearest 5 runs.

Again these graphs were quite interesting.

Visually, the strength of the relationship is indicated by the degree to which the top of the colours bars get close to trend line.

The runs and overs graphs look like they are a better fit than the run rates graph. But we can get extra evidence for the strength of fit from the R² value again.

This time the R² value for overs was significantly stronger than it was for either the run rate or the total runs.

The lower (green) graph suggests that for every extra over before the second wicket falls, the probability of winning increases by about 1.7%

Likewise, for every extra 5 runs scored, the probability of winning increases by about 1.7%

At this point I started to feel like there was some fairly significant evidence that having a top 3 that can see off the new ball is definitely the way to go.

The rate that the top order score at is important, but it doesn't seem to be as important as the number of overs that they bat for.

This makes sense for a couple of reasons. Firstly, a cricket ball is at it's most hittable when it is about 15 overs old. At this point it's still hard enough to go quickly off the bat, it has normally stopped swinging conventionally and hasn't yet started to swing unorthodoxly. The ball is the hardest to play when it is less than 10 overs old, because the ball will swing and seam, and the edges will go to hand, rather than dying as quickly as they do later on.

When the balls are 15 overs old, the match is 30 overs in.  At this point it's sensible for teams to have their best hitters at the crease.  If they are in too much before the 20th over, the hitters are being exposed to a swinging ball that makes it difficult to time their shots.

The next bit of analysis I did was to look at those 3 marks from above (14 overs, 75 runs and 4.7 rpo) and look at the difference in the results between teams that reached these milestones and teams that did not.

The outcomes were again quite interesting:

Here fast was any innings where the first two partnerships scored at more than 4.7 runs per over, big was where the second wicket fell when the score was over 75 and long was where the wicket fell after the 14th over.

The column titled "Relative" is the relative probability.  This means that teams that have their first two wickets score at more than 4.7 rpo are 37% more likely to win than teams who score slower than that.

Again we see that the biggest advantage is when the first two wickets last more than 14 overs.  It isn't a panacea, but it is important.

The final thing I did was to look at two similar skilled options, with different approaches, and see which was best.  To do this I looked at every innings where the second wicket fell between the 7th and 14th over, and the run rate was between 4.7 and 5.3 (this puts them roughly in the 2nd quartile for scoring rates and the 3rd quartile for length). I then also looked at times where the second wicket fell between the 14th and 19th over, with a run rate between 3.8 and 4.7 (which puts them roughly in the 3rd quartile for scoring rates and the second quartile for length).  Again the more cautious approach paid dividends, although this time with a much smaller sample size.

The more attacking start yielded 9 wins and 13 losses (40.9% winning record.) The more cautious start also yielded 9 wins, but only 8 losses (52.9% winning record). The difference here, however, is not statistically significant, given the low sample size. 

Putting this all together, an ideal opener is a player who averages at least 37.5 (half of 75), averages lasting 42 deliveries (half of 14 overs) and has a strike rate of at least 78.3 (equivalent to 4.7 runs per over).  If they a compromise has to be made on one of these it should be the strike rate, as that's the least important to help the team win.

So, finally, how does Alastair Cook stack up to these criteria?

At the time of writing Alastair Cook averages 37.51 runs from 48 balls at a strike rate of 77.66.

He (just) makes two of the 3 criteria, and the one that he misses out on he is very close to, and is the least important.

England have not been going well in ODI cricket recently, but Cook's batting is not the right thing to blame. The issues are clearly elsewhere.


  1. Well done, great stats. I would definitely agree that the issues lie elsewhere.

    That doesn't stop me from wondering if he should be playing. And since you've given me something to play with now, I had a quick look at his stats just for this year:

    In 2014 he averages 29,83 off 41 balls, with a strike rate 72,62. That looks a lot less rosy.

    Conversely, his historical averages for ODIs in Australia/New Zealand may edge him closer to selection again, just reaching the most important criterium at 48 balls. His average of 35,08 and strike rate of 73,08 fall short though.

  2. (Correction: there's a stray "just" in there that would imply 48 is just over 42. It got left behind in editing.)

  3. I am utterly positive that this exercise has used the statistics in the wrong way, confusing correlation with causation. This is incredibly detailed work though; so I want to go over this and provide the counter-point, something I have been working on in detail.

    All my analysis so far has led me to believe that in limited overs cricket, teams are far too cautious and that slow batting from the top-order contributes to losses very strongly. I want t respond to your post in depth, but before I do anything I'd really appreciate it if you could answer a quick question:

    Have you corrected for the fact that 'long partnerships' inevitably tend to be quick ones too, because they make up for the initially slow rate by picking up after a certain period. What I mean to say is when you look at long partnerships, a very large number of them will also be fast. If you look at 'big' partnerships, most of them will be long. However 'fast' is the only property that comes alone, so there are an abundance of 'fast, small' partnerships, whereas the issue becomes that 'fast+long' is automatically 'big'. Essentially these things aren't independent variables, which I think you may not have accounted for.

    I'd be a lot more convinced if you ran the stats for 'fast, short' vs 'long, slow' and 'fast, big' vs 'long, big'.

    1. I'd already got the stats for Fast short vs long slow, I just decided that they weren't particularly interesting. Here they are though:

      Fast, short won 35 lost 43.
      Long, slow won 41 lost 31.
      Long, slow is 27% more likely to be on the winning side.

      The fast big vs long big will take a bit longer, as there is quite a cross over between the two sets, and dealing with that appropriately is not really statistically straightforward.

      In terms of your first criticism that I'm confusing correlation with causation, you have a point. Having a higher winning percentage when x happens, does not necessarily mean that x causes a team to win more often. However, there had been a lot of criticism that Cook was losing games for England by batting too slowly. A positive correlation shows that there is not a negative causation. What I mean is that if the longer an innings goes without losing a second wicket, the more they win, we can say that losing the second wicket later does not cause the team to lose more often.

      Long partnerships don't actually tend to be that much faster. If you look at the first two graphs, they have the overs vs runs. Both of the graphs are quite linear. They certainly are not exponential, and while a power curve also provides a reasonable fit, the best model for it has a power below 1.2, so the uncertainty in the data is greater than any apparent curve.

      You are correct that the categories fast, big and long are not independent. However that doesn't matter at all when doing a relative probability analysis. That analysis assumes that they are not independent.

  4. To supplement my comment, my point is that when a team bats it can't really control anything other than the speed (by increasing aggression). Batting slowly only slightly increases the probability that a long partnership will develop and it has a very sketchy (if any) correlation to a big partnership developing.

    Additionally, nowadays most teams do play with a strategy of playing cautiously in the first few overs; a combination of this essentially means your analysis has captured 'long innings' which just tends to be a sub-set of the times that the team's strategy of keeping wickets in hand worked out. Unless a strategy is hideously wrong, it's pretty obvious that teams will win more when it was successfully executed; that's not an answer to the question of whether some other strategy was better.

    Elsewhere, I filtered for exactly the sort of 'solid innings' Cook is known for. All innings in the past year, with a SR of less than 75 and runs scored greater than 30. A team having a player with that kind of innings had a 42% win rate over a roughly 125 game sample, suggesting that this sort of innings tends to be contributing to losses. (Additionally, the vast majority of winning innings within that sample were batsmen who comfortably chased very low targets batting in a sedate manner. If I filtered it for 'batting 1st' the statistics would have been overwhelmingly losing).

  5. Hi, thanks for an excellent response. I am actually very surprised by those results. My initial thought is that there are two possibilities. 1) I am wrong and you are right about this issue; 2) There may be an issue with the way 'fast', 'long' and 'big' are defined in your search filters.

    After seeing this post, I decided to run the numbers on 'long, slow innings' from the top 3 batsmen, to see if this was true. I settled on the somewhat arbitrary but not unreasonable figure of 40 balls and over as a 'long' innings and 80 or lower SR as a 'slow' batting.

    I filtered for all innings matching this description (while batting first) in the last two years. The filter turned up 101 such innings in which there was a result (excluded 1 tie and several no results). The batsmen involved in these innings were on the losing side 58 times and on the winning side 43 times, which is a fairly strong indication that this isn't a very helpful kind of innings. In these innings, the batsmen involved averaged 42.73 at a SR of 66.77 (pretty much stereotype Alastair Cook numbers).

    Also, amusingly enough, the most commonly represented batsman in this sample was indeed, Alastair Nathan Cook, with 7 innings and one win amongst them.

    Do you find these results surprising or did I miss something, am I asking the wrong questions?

    1. The issue is that sometimes batsmen go particularly defensive because there have been a number of wickets fall at the other end. That was why I used partnerships. Otherwise you get an issue like in the NZ vs West Indies match recently where Guptill scored 80 odd off about 130, while at the other end Williamson came in during the 14th over and scored 56 off 60.

      Guptill made it possible for Williamson to do what he did.

      I looked at the fall of wickets because it paints a more accurate picture of what's actually happening in the middle.

      There is the difficulty where it makes it hard to analyse individual players, but that's an inherent problem with the fact that runs are really scored by a partnership.

      I heard a start once that while Hamish Marshall was at the crease, his partners scored about an extra 30 runs per 100 balls due to Marshall's running between wickets. People look at Marshall's numbers and say that he was only just above average as an international cricketer, but his partnership numbers may tell a different story. (I've never actually done the analysis myself, so I don't know. )

      Likewise we need to find a way to look at those individual innings in context. Was Cook batting slowly in those 7 innings because he was just batting slowly, or was it due to wickets falling at the other end? That's actually a very big question.

  6. A very interesting post;

    My first thought was with regard to this "The issue is that sometimes batsmen go particularly defensive because there have been a number of wickets fall at the other end". My perception is that this is an inefficient and not very logical adjustment to be making and players/teams that tend to make this particular adjustment tend to be the ones that do the worst. I also think from my T20 analysis that this carries over with players and is not just confined to a particular team strategy. For instance in this years IPL, I had Delhi Daredevils flagged for the bottom, because they were packed with batsmen with this sort of thinking who scored lots of runs slowly that could otherwise have been scored by other batsmen; De Kock, Taylor, Pietersen, Vijay and unsurprisingly the same result befell the team.

    Typically the idea that wickets slow the run rate, as well as the impact wickets are having on match results, seem to be part of a self-fulfilling prophecy rather than necessary consequence of a wicket falling. The overly exaggerated slowdowns often tend to precipitate even more wickets as the bowlers get on top and the fields come in.

    "There is the difficulty where it makes it hard to analyse individual players, but that's an inherent problem with the fact that runs are really scored by a partnership."

    This is another very interesting statement, because I have always analyzed the individual players and certainly thought of runs as being scored by players rather than the partnership, though it's certainly true that players often take on very different roles in a partnership. Without doing basketball style +/- numbers though it's very difficult to assess the impact a player has on team-mates as in the Marshall example.

    The context of an innings is very valuable, but often it creates chicken and egg questions. It's just as easy to assert that wickets fell at the other end because other batsmen had to compensate for Cook's stodginess and to that extent if you credit Guptill for 'allowing' Williamson to score fast, by the same token when it doesn't come off you have to blame him for 'forcing him' to do so