Tuesday, 9 June 2020

A brief look at test openers

I saw an interesting discussion online about who was a better test opener, Chris Gayle or Virender Sehwag. 

I really rate Gayle as a test player, but there's no doubt that Sehwag's raw numbers are better. But I wondered how much of that was down to him playing most of his cricket in India, which is custom made for his style of opening batting.

I thought that an interesting way to compare them would be to look at how much better each of them was at getting a start and going on after getting their start when compared to other players that they played with and against.

Comparing them with others in the same match is not without it's flaws, but it does allow the pitch conditions etc to be taken into account.

I decided to use 25 runs as the cut off for having seen off the new ball. This is not always going to work, but it seemed to be as good a cut off as any.

I looked at their proportion of getting to 25, and their average extra runs scored after they made it there, then compared that to the other openers playing in the same matches as them.
Gayle scored 25+ 19% more often than others in the same matches, and average 9% more from that point on. Sehwag scored 25+ 13% more often than others in the same matches, but went on to score 25% more runs once he was set.

After looking at those two, I wondered what to make of the numbers. Which was the most important, and how did they compare to others?

So I decided to look at every player with 4000 test runs as an opener.
Graph showing comparison between openers and their peers

Then highlighting the two that I started looking at:Openers with 4000 runs, highlighting Sehwag and Gayle

This led to a few more questions. What if I added in the averages - a more traditional statistic?

Well, as you head towards the top right, the average tends to get better, while it gets worse towards the bottom right - just as you would expect.

This made me wonder how good the Euclidean distance from the bottom left would be at  predicting batting average. Euclidean distance is where the values for each variable are scaled from 0 (the smallest) to 1 (the largest) and then Pythagoras' Theorem is used to find the distance from (0,0). 

Plotting that against the average gave this graph
The two red points are Gayle and Sehwag. Sehwag is the point above the line.

Basically, the distance gives a reasonable prediction of the average, with most players within about 5 runs of the expected average based on the distance. Given that the players played in often quite varied conditions, this is a quite pleasing result.

This graph suggests that Sehwag's average probably overstates how well he acutally did, while Gayle's is a fair reflection. However, after accounting for the difference in conditions that they played in, Sehwag can probably still be regarded as the more successful test batsman - at least by this metric.

No comments:

Post a comment