Wednesday 3 September 2014

Winning an ODI from the front

Is Cook the captain to correct England's ODI ship?
There have been a lot of debates recently about the role of the top order in an ODI. Is it more effective to come out swinging or is a more cautious approach more appropriate?

Until 1992 the expectation was that the top order's job in an ODI was to see off the new ball, and scoring at 3 an over was fine. If you look at the great opening bowlers of the 80's you will see that most of them had economy rates near 3.5 rpo. Then in the 1992 World Cup something wonderful happened. In the 10th match, New Zealand were playing South Africa, and one of New Zealand's premier batsmen, John Wright, got injured. In came Mark Greatbatch. Rather than playing the traditional openers role, he took advantage of the fielding restrictions and scored 68 off 60, including hitting Allan Donald back over his head for a 6 that landed on the roof of the stand. From this point onwards, New Zealand's approach changed, and the first 15 overs were seen as the best time to score quick runs.

In the 1995/6 Australian tri-series, Sri Lanka took that tactic to the next level. Kaluwitharana and Jayasuriya batted like whirlwinds and not long later they helped take their team to victory in the World Cup. At this point the world really stood up and took notice. The dashing opener was now the in thing. New Zealand had Astle, Australia had Gilchrist, India used Sehwag, Pakistan opened for a while with Afridi. An attacking opener had become as much a part of the game as using a spin bowler in the "boring middle overs."

But recent changes to the game, such as the different fielding restrictions at the end of the game and 2 new balls have meant that some people have questioned if going for it at the outset is such a good tactic now. Is it a better idea to keep wickets hand and "go harder, later."

The majority of this discussion has come out of England, where the roles of captain Cook, Bell and (before he dropped out of the game) Trott have been coming under increasing scrutiny. Are they batting too slow? Are they putting too much pressure on the players coming in after them?

This led me to have a look at the role of the top 3, in the past 2 years. What I wanted to know was what had the biggest impact on a team winning, the openers batting a long time, them scoring a lot of runs or them scoring at a quick run rate.

The first thing that I did was I got a list of the scores when the second wicket fell in each match in the past 2 years, and the outcomes of the matches.

The next step was to graph it, and see what came out.

First I looked to see if there was a different relationship between the overs taken and runs scored for the first 2 wickets for teams that won and teams that lost.




We can see that there is a fairly strong relationship with both graphs.

I set the intercept of the trend lines to 0, so that the gradients are effectively the run rates.

There are a few noticeable differences.  Firstly the teams that win tend to score at a higher run rate for the first 2 wickets than the teams that lose. However when I overlapped the two graphs, this difference was not as striking visually as it is numerically. (After 40 overs the trend lines are actually only 28 runs apart)

More significantly, there's a lot more data above 15 overs on the winning graph than on the losing graph. There is also a lot more instances on the winning graph where the first two wickets have contributed 150 or more runs.

I ran a quick bootstrap analysis and found that there was a statistically significant difference between both the median number of runs scored at the fall of the second wicket by the winning teams and the number of overs that they batted for.

To highlight that difference, I drew a cumulative frequency graph. The difference in the distribution of the number of overs faced by the winning and losing teams is quite striking.

That gave me some reason to search further.  It seems that there is a statistical evidence to say that there is a difference between the performances of the top 3 batsmen of teams that win and lose.

The next question was to see which made more of an impact: the number of runs, the number of overs or the rate that the runs were scored at.

To do this I ordered the innings by each of these three variables, and then looked at how many of the 9 innings surrounding each one were won and lost. This is not particularly intuitive, but it did give me some idea about the impact an increase in the variables would have on the likelihood of winning.

First I looked at run rate:

There is a trend here, but it's certainly not a strong relationship.

It's clear that increasing the run rate that the first two partnerships score at can contribute to increasing the likelihood of winning.

The important number here is the R² value of 0.35914. The closer to one this value is, the more linear the relationship is. While it is not a perfect measure of how strong a relationship is, it is a good indication.

The next graph that I looked at was the total runs scored.

This is a similar relationship, but it is clearly stronger.

The points are generally closer to the trend line and the R² value is higher (0.44074).

The data (expectedly) thins out as the number of runs increases, as it's quite rare for teams to get to 200 for the loss of only one wicket.

The R² value is less than 0.5, so there's still more of the variation that is unexplained than explained by this relationship, but again that is to be expected, as there are a lot more factors in a game than the first 3 batsmen, and it is always possible for a game to change suddenly.

The total runs scored for the first two partnerships seems so far to be a better predictor of success than the rate that they scored at.

The third factor is the one that I found the most interesting.


The relationship between the overs batted and wins looks (unsurprisingly) quite similar to the relationship between runs and wins.

The R² value for overs is lower than the corresponding value for runs, but both are higher than the relationship with run rates.

The point where the teams are winning more than they are losing are roughly 14 overs, 75 runs and 4.7 rpo respectively. These numbers start to give us an idea about what we should be looking for in an opener.

However, I wasn't totally convinced by these graphs.  I wondered if they would have turned out the same if I had chosen to look at 15 innings or 5 innings, or some other slightly different way of looking at it.  So I decided to try looking at the winning probability for individual points.  To do this I rounded the run rates to the nearest 0.1 rpo, the overs faced to the nearest over and the runs to the nearest 5 runs.

Again these graphs were quite interesting.


Visually, the strength of the relationship is indicated by the degree to which the top of the colours bars get close to trend line.

The runs and overs graphs look like they are a better fit than the run rates graph. But we can get extra evidence for the strength of fit from the R² value again.

This time the R² value for overs was significantly stronger than it was for either the run rate or the total runs.

The lower (green) graph suggests that for every extra over before the second wicket falls, the probability of winning increases by about 1.7%

Likewise, for every extra 5 runs scored, the probability of winning increases by about 1.7%

At this point I started to feel like there was some fairly significant evidence that having a top 3 that can see off the new ball is definitely the way to go.

The rate that the top order score at is important, but it doesn't seem to be as important as the number of overs that they bat for.

This makes sense for a couple of reasons. Firstly, a cricket ball is at it's most hittable when it is about 15 overs old. At this point it's still hard enough to go quickly off the bat, it has normally stopped swinging conventionally and hasn't yet started to swing unorthodoxly. The ball is the hardest to play when it is less than 10 overs old, because the ball will swing and seam, and the edges will go to hand, rather than dying as quickly as they do later on.

When the balls are 15 overs old, the match is 30 overs in.  At this point it's sensible for teams to have their best hitters at the crease.  If they are in too much before the 20th over, the hitters are being exposed to a swinging ball that makes it difficult to time their shots.

The next bit of analysis I did was to look at those 3 marks from above (14 overs, 75 runs and 4.7 rpo) and look at the difference in the results between teams that reached these milestones and teams that did not.

The outcomes were again quite interesting:


Here fast was any innings where the first two partnerships scored at more than 4.7 runs per over, big was where the second wicket fell when the score was over 75 and long was where the wicket fell after the 14th over.

The column titled "Relative" is the relative probability.  This means that teams that have their first two wickets score at more than 4.7 rpo are 37% more likely to win than teams who score slower than that.

Again we see that the biggest advantage is when the first two wickets last more than 14 overs.  It isn't a panacea, but it is important.

The final thing I did was to look at two similar skilled options, with different approaches, and see which was best.  To do this I looked at every innings where the second wicket fell between the 7th and 14th over, and the run rate was between 4.7 and 5.3 (this puts them roughly in the 2nd quartile for scoring rates and the 3rd quartile for length). I then also looked at times where the second wicket fell between the 14th and 19th over, with a run rate between 3.8 and 4.7 (which puts them roughly in the 3rd quartile for scoring rates and the second quartile for length).  Again the more cautious approach paid dividends, although this time with a much smaller sample size.

The more attacking start yielded 9 wins and 13 losses (40.9% winning record.) The more cautious start also yielded 9 wins, but only 8 losses (52.9% winning record). The difference here, however, is not statistically significant, given the low sample size. 

Putting this all together, an ideal opener is a player who averages at least 37.5 (half of 75), averages lasting 42 deliveries (half of 14 overs) and has a strike rate of at least 78.3 (equivalent to 4.7 runs per over).  If they a compromise has to be made on one of these it should be the strike rate, as that's the least important to help the team win.

So, finally, how does Alastair Cook stack up to these criteria?

At the time of writing Alastair Cook averages 37.51 runs from 48 balls at a strike rate of 77.66.

He (just) makes two of the 3 criteria, and the one that he misses out on he is very close to, and is the least important.

England have not been going well in ODI cricket recently, but Cook's batting is not the right thing to blame. The issues are clearly elsewhere.