Tuesday 13 October 2020

IPL at halfway

The match between Kolkata Knight Riders and Royal Challengers Bangalore was the 28th match, and represented the halfway point in this year's edition of the IPL. 

Every team has played every other team once, and there have been some interesting patterns emerge. I'm going to look at a few of those in this article.

The most obvious pattern is that teams have won more often by batting first than by chasing. This is somewhat unusual. In most T20 competitions the toss doesn't make much difference, and generally it's better to field first.

This pattern hasn't really been seen in other competitions, where the result has all been within the expected margin of error.

The most recent BBL was slightly biased towards batting first, and the CPL and PSL were similarly biased towards bowling first, but the difference in the IPL has been much more dramatic. If we were looking at throwing a coin multiple times, there would be a 12% chance of getting a result as extreme as the BBL and a 20% chance of getting a result as extreme as the CPL one.

The probability of getting as extreme a difference as this year's IPL from just randomness is less than 3%. Using technical language we can say that batting first makes a statistically significant difference to a team's probability of winning.

At the start of the tournament, pretty much every captain chose to field. Only one brave soul (David Warner) chose to bat first in the first 13 matches. However, since then teams have chosen to bowl first in all but 3 matches, and in 8 of the last 9. 

When teams have chosen to bat first, they've only won twice. That's a winning record of just 13.3%. When teams have chosen to field first, they've done much better - winning 7 out of 13. The best outcome at the toss seems to be to lose it, and have the opposition captain decide to field.

Breaking it down by location is interesting too. In Abu Dhabi, there is no clear advantage to batting first. At that ground the chasing team has won 4 out of the 10 matches. At the other two grounds, however, it is a different story. At Sharjah the chasing team has won only once out of 6, while at Dubai the chasing team has won twice out of 12. It's hard to know why this is, and it will be interesting to see if it continues throughout the tournament.

After about a quarter of the tournament was done, I noticed that there seemed to be a pattern that batsmen who turned the strike over quickly had a bigger impact on their team's chance of winning than players who scored extra boundaries, although I wondered if that was just a statistical anomaly from a small dataset.

It seems that it was just a product of a small dataset. Now that there's more data, it seems that neither activity rate nor boundary rate on their own from an individual batsman make a significant difference. They both seem to help - teams win more often when their batsmen score more boundaries and more run runs, but neither seems to explain enough to discount the other one anymore.

There is a noticeable difference in scoring rates of batsmen at the three grounds. They tend to score much quicker at Sharjah (probably due to the short boundary) than they do at the other grounds.

An interesting development is that recently teams have found it more difficult to turn over the strike at Dubai than at the other two grounds. 

The median strike rates for innings over 30 at each ground is:

This leads to a suggested good team score at each ground of 174,176 and 196 respectively.

Those are the team scores equivalent to an average set batsman batting the whole innings. I find that a good guide to reliable winning targets at grounds. They're possibly slightly high at the moment, due to the awful record of chasing sides, but that may change as the tournament progresses.

Looking at the bowling stats, I find it useful to group the attacks based on their styles.

This leads to this graph. It takes a while to understand, but the squares are all the pace bowlers combined and the triangles are all the spin bowlers combined.

It is possible to use these groupings to predict the success of the teams. The two key statistics to look at are the economy rate of spin bowlers and the strike rate of the pace bowlers.

Looking at these two statistics suggests that CSK are probably the team who have been underperforming the most with the bat, as their bowlers are doing a sufficient job to keep them in the matches.

Finally, I used the data to build a predictive model using logistic regression to assess how good the teams were. As every team has played each other exactly once, the basic model is fairly uninteresting, but the one where batting first is controlled for is much more interesting.

The difference in the coefficients of any two teams gives the log odds of the result for that match (and hence the probability can be calculated from it).

The batting advantage is added and subtracted from the teams. So for example if Mumbai bat first against RCB, they would have an expected value of 2.43 + 1.27, while RCB would have 2.04 -1.27. This means that for almost every match up, the team batting first would be favoured to win. The only exception is when Mumbai, Bangalore or Delhi are playing against Kings XI Punjab. There they would still be the favourites, even if batting second.

This model is only based on 28 matches, so is clearly not perfect. But it is an interesting guide to how well the teams have been playing, and I think somewhat informative.


  1. How about the 2021 season? What team do you think will win though?

    - Yang | Pest control

  2. This comment has been removed by a blog administrator.

  3. interesting article. I'm a stats person too by trade. Couple of questions:

    1. How did you come up with the batting advantage of '1.27'?
    2. While creating the predictive model, which variables did you use? I'm curious as to how you structured the regression equation as to get a coefficients above?