CricketGeek: 2015

Friday, 27 November 2015

Heads or Tails?

The start of a cricket game is actually 30 minutes before the first ball is bowled. The ritual of it has changed a few times over the years, but basically what happens is the captains walk out to the pitch wearing their blazers over their whites with an umpire and often a cameraman and commentator. The umpire gives the coin to the home captain who tosses the coin up in the air, and then the visiting captain calls heads or tails before the coin lands. The coin is left to land on the pitch, and then the captain who has won the toss is asked if he wants to bat or bowl.

It's an old tradition. Using a coin to make a decision has been done for at least 2000 years. Using it in cricket has dated back to at least the 1850's (The toss result was recorded in the match between Oxford and Cambridge Universities in 1858). And yet there is now some calls for it to be done away with. The ECB are going to experiment with removing the mandatory toss for the 2016 County Championship season.

There is a thought that the toss is too influential. There is a perception that there are too many matches where "if you win the toss, you win the match." Just over 50% of respondents wanted the toss to be done away with on an ABC poll.

I remember having a conversation with a friend who is a fan of most sports, but he was sick of cricket because "the toss of the coin has too much impact." As a cricket statistician, this is wonderful, as it's something that I can test. What is the impact of the toss on cricket matches. How much more often does the team that wins the toss win the match?

In all test matches, the team that has won the toss has won the match 749 times, and the team that lost the toss has won 671 times. These combined with 734 draws and 2 ties have meant that the team that wins the toss has won the match 34.7% of the time, and the team that lost the toss has won 31.1% of the time. This is a relative probability of 1.116. This means that the team that has won the toss has had a 11.6% higher chance of winning the test.

This is slightly lower than I would have expected, but it's still a statistically significant difference. (Statistically significant is a technical term that basically means that we can say that we have enough evidence that there is a difference, and that it's unlikely to be just because of randomness).

But if we delve deeper into these numbers, then some interesting things turn up. First lets break it up by home and away. This is an important distinction, as the home teams will generally be better at reading the conditions. I would have expected that winning the toss at home would provide a bigger difference than winning at home. It turns out to be so, but not by nearly as much as I expected.

When the home team has won the toss, they've won 41.8% of the time (467/1117 matches), when the home team has lost the toss, they've won 37.9% of the time (394/1039). This is a relative probability of 1.103, or a 10.3% increase in probability of winning. There are different methods of testing for significance, and this is right on the edge of being significant or not. In other words, while there's a 10% difference, if we randomly selected the results, it would be reasonably likely that we would get this sort of difference.

When the away team wins the toss, they've won 27.1% of matches (282/1039) vs 24.8% when losing the toss (277/1117). This is a relative probability of 1.094: away teams have won 9.4% more often when they've won the toss than when they've lost the toss. Again, there is a difference, but it's not statistically significant.

The reason why it can be significant with the full group, but not with any sub group is due to the smaller sample sizes.

The similarity can be represented fairly well graphically.

The result proportions for home and away are very similar, regardless of who won the toss.

Now at this point, I wondered if this was simply due to older tests, where perhaps there was less doctoring of pitches. It is an interesting idea that pitches are doctored more often now. I would have thought that if anything, the nature of pitches round the world is now more similar.

To look at this, I selected a completely arbitrary cut off point of the 1st of January 2000. Looking at matches that started after that date, I found these numbers:

Overall the team that has won the toss has won 260 out of 680, while the team that lost the toss has won 252. That's a relative probability of 1.032 ie. in the past 15 years, the team that won the toss has won 3.2% more often than the team that lost the toss.

That's an almost negligible difference.

Breaking it down to home/away it gets even closer.

When the home team won the toss they've won 46.9%, when they lost the toss, they've won 46.9%. The difference is so small that you have to go to the 4th significant figure to be able to measure it.

When the away team won the toss they've won 29.4%, when they lost the toss, they've won 27.4%. The relative probability is 1.072. Surprisingly this is much larger than the advantage for the home team, which suggests that the difference is just down to the effect of randomness rather than the effect of the toss.

Here it is visually:

Again the similarity is remarkable.

So, to assess our original question: does the toss have a major impact on the result of the match? No. Absolutely not. Winning the toss has historically only given teams an 11% higher chance of winning, and recently that's reduced to only 3.2% since the year 2000.

*picture of Bradman and Allen at the coin toss thanks to Wikimedia.

Sunday, 13 September 2015

Anderson or Neesham

An email dropped in my inbox on Friday from New Zealand cricket naming the squads for the test series with Australia and the NZA squads to play Sri Lanka A. There was a lot to talk about, with new players being named in the A squad and the prospect of a day-night test coming up. But over 1/4 of the press release focused on two players: Jimmy Neesham and Corey Anderson. When I got in the car to come home from work, I listened to a debate between Darcy Waldegrave and Goran Paladin about who should be picked, Anderson or Neesham.

A quick scroll through the different websites I check for cricket news found that this was the story that most writers picked up on:
Cricbuzz put down their Indiacentric blinkers to lead (briefly) with "James Neesham, Corey Anderson named in Test squad for Australia Tour".
Andrew Alderson's piece in the New Zealand Herald was Anderson v Neesham: Let the contest begin.
David Leggatt wrote a piece for the Otago Daily Times titled Cricket: Black Cap selectors face all-rounder quandry.
Mark Geenty went for the slightly more negative Concerns remain over Anderson, Neesham as recovery race heats up for Brisbane on stuff.co.nz.
Wisden India went for Anderson, Neesham return for Australia Tests.

It was clear that this was the key talking point for most people.

And that's fair enough too. It's not often that a team has a genuine all rounder. To have two players who have the potential to develop into such players is remarkable. Given that neither are quite there as being both first choice batsmen and bowlers, to pick them both is unlikely, so a show down is likely.

Neesham is probably less aggressive than Anderson with the bat, which has led to him having more success in the few tests that he's played in. Anderson has really made his name in ODI cricket. Neesham, on the other hand, didn't make New Zealand's 15 man squad for the World Cup. With the ball, Anderson has been very effective in ODI cricket, but has not really performed as well in tests. Neesham has not taken a lot of test wickets, but has been quite effective at holding down an end. Anderson tends to rely on bounce and his left-arm angle, while Neesham has a good cutter, and tends to attack the batsman's body more.

There is quite a bit of debate about who is the better player, and so the prospect of them both being fit, and us seeing who Hesson opts for is tantalizing.

In the conversation on the radio, Waldegrave and Paladin both said that it was clear that Neesham had better statistics, and so he should be picked. My ears immediately pricked up.

The basic statistics do bear that out.

Batting	Anderson	Neesham
Innings	18	15
Runs	533	606
Average	31.35	43.28
100s	1	2

Bowling	Anderson	Neesham
Overs	174	109.5
Runs	500	361
Wickets	13	11
Average	38.46	32.81

Neesham has the better batting and bowling average. He's scored more runs in less innings, and has taken roughly the same number of wickets in roughly half the innings.

Here's a graphical representation of their batting scores so far:

A quarter of Anderson's innings have been scores of 2 or less, which is certainly not ideal. Neesham, on the other hand, has a quarter of his at 78 or higher.

Bowling innings are not so easy to show in a graph, but I felt that it was useful to see the difference. I've graphed their average vs the number of overs bowled.

We can see the trends in the numbers - Anderson's average is increasing, while Neesham's is decreasing. When Anderson had bowled the same number of overs as Neesham has now, their averages were similar, but Anderson's averages have risen fairly steadily since then.

However, straight summary stats can be misleading. I was interested to see if Neesham truly did have better statistics to the point where we could be confident that he would perform better.

I'm finding that more and more I distrust cricket basic statistics to tell me about players. That is an odd thing for a stats blog to say, but please hear me out.

Firstly a batsman's previous innings is not actually the full list of what he was capable of doing. It is effectively a sample. Of all the times that he could have played, he only actually played a few of them. (They have both batted on about 40 days, over the space of 3 years). Treating their previous results as population data, where we can compare summary statistics directly is dangerous, because these are effectively actually a sample of what their careers will eventually be. (Assuming here that they will play more). They are also only a sample of the scores that they were capable of throughout their careers. Perhaps they would have scored more if the last series they played in had been longer, or if there was an extra tests added into the last tour that they were on.

When we compare samples, we need to use statistical techniques in order to be able to account for sampling variation. Sampling variation is basically caused by not having enough information. There are a range of techniques to do this. If we have reason to believe that our population is normally distributed, we can create confidence intervals using descriptive statistics. However, we know that cricket scores are not normally distributed. Scores tend to be skewed to the right - ie the majority of scores are below the average. (Some examples Graham Dowling scored less than his average in 68% of his innings, Graham Smith scored below his average in 73% of innings, Gordon Greenidge scored less than his average in 68% of his innings and Don Bradman scored less than his average in 64% of his innings - if scores were normally distributed, then most players would score their average in roughly half their innings).

Another technique that can be used is a technique called bootstrapping. This is where a confidence interval is created by resampling with replacement. This is almost black magic, in that it uses just the variation in the sample to describe the variation in the population, and, despite it seeming illogical at first, it actually tends to work remarkably well. (For example, the bootstrap confidence intervals for the first 25 innings for Graham Smith, Sir Don Bradman and Gordon Greenidge all include their final career average. It even worked for Sir Frank Worrell, who had an amazing start to his career followed by a poor end)

The easy interval to construct was the batting scores. Here I randomly selected their batting innings, and calculated the average of each batsman. Then I subtracted Anderson's resampled average from Neesham's resampled average. If a number came out positive, then it meant that Neesham's average was higher, if it was negative, it meant Anderson's was higher. After taking 1000 resamples, I then looked at the central 95%. If it is all positive or all negative, it implies that there is a true statistical difference.

Here's the graph of the results

The red line in this graph indicates the confidence interval. Here we can see that the interval includes both positive and negative numbers. This means that we cannot make a call based on the start of their careers as to who is statistically the best. They are too close to call.

Bowling is harder to compare. There are so many things to compare that it can be really difficult. The way that I chose to compare the bowling was to think about what the job is that they are going to be asked to do. in the media conference Mike Hesson was actually quite clear about what role he expected Neesham or Anderson to do. They were to be an additional support bowler, in the same way that they have been used throughout the recent games. I looked at all the matches since McCullum has been captain, and the median overs bowled by the 4th seamer (or 3rd seamer when 2 spinners were picked) was 11. (For this I ignored a few innings where McCullum bowled himself for an over or 2, but I included the innings where he actually bowled 2 full spells)

As a result I normalised each bowling innings by Neesham or Anderson to 11 overs. To do this I added on a percentage to the run rates for situations where a bowler had only bowled a few overs. This meant that 0/25 off 5 became 0/65 off 11 and 2/12 off 6.1 became 3/25 off 11. There are obviously issues with this, but I felt that it was fairer than any other method that I could think of.

After normalizing we can see that the runs distribution is similar, but Neesham took more wickets more often.

The bootstrap results looked like this:

Again there is not enough evidence to actually say who is statistically better.

A third way to look at it is to compare the contribution in individual matches. For this I selected batting innings and bowling innings randomly from each player. I added Neesham's batting to Anderson's bowling, then subtracted Neesham's bolwing and Anderson's batting. If the result was positive then Neesham had made the bigger impact, if it was negative, then it was Anderson.

The result of this was as follows:

Again, there is not enough evidence to make a call.

However, all this data is from some very small samples. This is why the confidence intervals are so wide. To try and make any sort of call from such a small sample is really sketchy. To be able to make a valid comparison, I needed more data. Accordingly, I decided to look at their first class records. This time they both had more than 50 innings, and so the data was a little more useful.

However, the results were similar, despite the intervals being smaller. In every case the outcome included both positive and negative numbers, meaning that we could not make a call statistically who was the better player.

What does this mean in the context of selection?

Quite simply it means that the selectors need to rely on what they notice, rather than on the statistics. Who do they think will be successful, given their experience in the game, and their intuition for knowing which players are likely to do well.

Selection is not an exact science. In this case there is not a compelling statistical argument for either Neesham or Anderson, and so it really should come down to who the selectors feel would be most effective on the pitches that they are playing on.

Statistics can tell you a lot of things. But it cannot tell you everything. It is a tool for finding patterns, rather than a crystal ball for divining the future perfectly.

Sunday, 29 March 2015

New Zealand vs Australia: Head-to-Head

I've heard a number of commentators say that man-for-man, Australia have better players, but New Zealand is a better team. This strikes me as a peculiar thing to say, given that there's often no analysis included of individual head-to-head.

So I've decided to do it myself, in order to see if there actually is a clear difference, man-for-man.

I've tried to line up the players by role. Both teams have players that do similar roles generally, with only a couple of exceptions.

I'm looking at their world cup so far, as well as their numbers since 1 Jan 2013 in New Zealand and Australia.

Role 1 - Slower opener

Player	Guptill	Finch
WC Average	76.00	40.00
WC S/R	108.79	93.64
2 year Average	43.00	37.00
2 year S/R	85.08	87.13

Guptill is in better form, but the 2 year numbers are very close. These are two players of similar ability who are both playing good cricket. Both tend to be slow to start, but are capable of increasing their scoring rate once established.

Role 2 - Fast opener

Player	McCullum	Warner
WC Average	41.00	50.00
WC S/R	191.81	124.48
2 year Average	38.00	40.75
2 year S/R	135.02	104.21

Again the numbers are very close. Warner has the higher average, but McCullum scores faster. Both are remarkably good at both scoring boundaries and finding singles, but both are prone to hitting bad balls straight to fielders. McCullum has shown a weakness against left-arm spin, so there's a chance that Clarke might bring himself on to bowl early on.

Role 3 - First drop

Player	Williamson	Smith
WC Average	37.00	57.66
WC S/R	83.14	94.02
2 year Average	51.08	60.92
2 year S/R	85.64	94.77

These two players are the best batsman for their team in recent times. Smith is ahead on these numbers, but it's wrong to say that Williamson is a weakness in the New Zealand side. Both manage to score at a good rate without looking like they're trying. Both also have a big impact on their team's chances of succeeding. New Zealand win 45% of the time when Williamson scores under 40 and 62% when he scores 40+. Australia have won 53% of the time when Smith's scored under 40 and 86% when he's scored 40+.

Role 4 - Innings builder

Player	Taylor	Clarke
WC Average	30.16	29.00
WC S/R	63.06	92.94
2 year Average	46.82	23.66
2 year S/R	79.30	80.49

Clarke's had a slightly better world cup, but Taylor has produced more quality innings' over the past 2 years, averaging almost twice what Clarke has. These two are both great players, who often play roles that allow others to shine. As a result, their numbers don't truly tell the story of their contributions. Both players' numbers are also a reflection of their battle with injuries.

Role 5 - rebuild or launch

Player	Elliott	Watson
WC Average	37.83	41.20
WC S/R	107.07	107.85
2 year Average	44.76	36.82
2 year S/R	94.63	94.70

A really interesting role in modern cricket is the number 5 batsman. Their role is sometimes to steady a rocking ship, and other times it's their role to attack, and build on the foundation of the players above them. It is difficult to separate the ability of Watson and Elliott to do this role.

They also both have a role to play with the ball as the extra bowler:

Player	Elliott	Watson
WC Average	34.00	74.00
WC E/R	8.5.0	6.72
2 year Average	25.88	101.50
2 year E/R	6.56	6.37

Elliott has been more expensive, but has also broken partnerships quite regularly.

Overall, it's really difficult to separate these two with bat and ball. I'd probably back Elliott as a batsman, but Watson with the ball, despite his numbers not being as good.

Role 6 - Agressive batsman

Player	Anderson	Maxwell
WC Average	38.50	64.80
WC S/R	109.47	182.02
2 year Average	41.77	33.42
2 year S/R	125.96	125.13

One of the dangers of comparing players like Anderson and Maxwell based on statistics is that they are often asked to do different jobs. Against South Africa, Anderson's job was not to come in and score at a massive strike rate. His job was to play sensibly and carry the innings through. Over a longer term it's difficult to separate Anderson and Maxwell. Both are capable of being absolutely breathtaking with the bat.

Both play quite different roles with the ball, so I'll look at them later.

Role 7 - Wicket-keeper batsman.

Player	Ronchi	Haddin
WC Average	14.60	42.00
WC S/R	125.86	157.50
2 year Average	38.88	41.11
2 year S/R	128.84	111.11

Haddin has had a much better world cup, but it would not be difficult to argue that Ronchi has been the most effective death batsman in the world in the past couple of years.

They're also difficult to separate with the gloves. Both are solid keepers who have made a couple of key mistakes, but in general, they've done the job required of them sufficiently.

Role 8 - Bowler who bats

Player	Vettori	Faulkner
WC Average	41.00	14.66
WC S/R	164.00	176.00
2 year Average	15.66	44.25
2 year S/R	123.68	114.56

Faulkner and Vettori have very different styles, but can both be very effective. Vettori has rediscovered his batting form of 2008-2012 in this world cup, during which time he was one of New Zealand's best batsmen as well as being an outstanding bowler. Contrastingly, Faulkner hasn't found his rhythm since returning from injury.

Role 9 - Right arm opening bowler

Player	Southee	Hazlewood
WC Average	27.13	20.85
WC E/R	5.57	4.19
2 year Average	28.97	20.44
2 year E/R	5.53	4.37

Hazlewood has the advantage here numerically, but some of that is due to Southee having a role bowling at the death. I don't think many selectors would pick Hazlewood over Southee, regardless of the difference in their stats.

Role 10 - left arm opening bowlers

Player	Boult	Starc
WC Average	15.76	10.20
WC E/R	4.41	3.65
2 year Average	22.09	14.37
2 year E/R	4.58	4.34

Starc and Boult probably been the two best bowlers in the tournament. They both offer different things. Starc bowls into the pitch with a high arm action that causes the ball to bounce higher, but it also gets less movement and arrives to the batsman later, despite the quicker speed through the air. Boult bowls over his front foot and tends to bowl more deliveries along the wicket than into the wicket. As a result, the ball swings more and arrives at the batsman faster. (Ed Cowan, after facing both, commented that Starc might bowl 5-10km/h faster but you have a lot more time to face the ball. Boult certainly feels faster.)

It's difficult to separate them, but not impossible. Starc has been the premier white ball bowler in the world recently.

Role 11 - 3rd seamer

Player	Henry	Johnson
WC Average	-	24.66
WC E/R	5.00	5.43
2 year Average	19.00	25.15
2 year E/R	4.26	5.04

Henry has only bowled 8 overs this world cup, as he wasn't even in New Zealand's original squad. His first 5 overs included 2 maidens and conceded only 9 runs against South Africa. He went the distance in his next 3 overs, but even then, a large proportion of his runs came through edges and mis-hits. Johnson is a master with the red ball, but hasn't had the success recently with the white ball that he had earlier in his career. It's still difficult to complain about an average of 25 and taking a wicket every 5 overs.

Role 12 - 4th seamer

Player	Anderson	Faulkner
WC Average	16.21	23.00
WC E/R	6.45	4.90
2 year Average	22.69	26.57
2 year E/R	6.42	5.45

This is a slightly more difficult comparison, as Anderson generally bowls at the death. His economy rate here is outstanding, and he's taken a lot of wickets. However, taking wickets with bad balls isn't necessarily a trait that is repeatable. Faulkner looks like a better bowler, despite his numbers not being quite as dramatic as Anderson's.

Role 13 - Spinner

Player	Vettori	Maxwell
WC Average	18.80	36.20
WC E/R	3.98	5.83
2 year Average	35.12	32.91
2 year E/R	4.10	5.24

Vettori is in a different class here. It's probably the only place where there's a clear difference in quality between players doing similar roles in the two teams.

Overall it's really difficult to separate the two teams. Both have a team full of good players in good form. They have players doing similar jobs, often in similar ways.

I don't think I can honestly say at this point which team has better players. The more you look at this match, the more mouth-watering it becomes.

Tuesday, 10 March 2015

Updated QF prediction chart

In my previous post I ran a simulation to find out potential quarter-final places. I received some criticism for having England so low, and Bangladesh so high, but events over the past 48 hours have shown that the respective probabilities of the two teams qualifying may not have been so far off.

The program that I wrote to do the simulation was corrupted when my computer crashed and I foolishly hadn't saved it, so I've written a different one to re-calculate. This time I made a couple of modifications. I moved from an additive model for run rates to a multiplicative one, as that seemed to be more sensible (teams are realistically a % better than other teams, rather than a fixed number of runs better. We would expect the margins to blow out more in terms of runs on better batting pitches than on difficult tracks).

I also slightly reduced the standard deviation of the simulation by moving it to one quarter of the mean rather than one third. This again made the results seem more sensible. There were too many teams scoring over 400 or under 100 previously.

Here are the new results. This table shows the probability of each team qualifying in position 1, 2, 3 or 4 in their group, and then the total probability of qualifying. Again I have not factored rain into this, and with Cyclone Pam heading towards New Zealand that may be a little optimistic.

Team	1st	2nd	3rd	4th	Quarters
New Zealand	1	0	0	0	1
Australia	0	0.976	0.024	0	1
Sri Lanka	0	0.024	0.9725	0.0035	1
Bangladesh	0	0	0.0035	0.9965	1
-	-	-	-	-	-
India	1	0	0	0	1
South Africa	0	0.976	0.024	0	1
Pakistan	0	0.017	0.664	0.1165	0.7975
Ireland	0	0.007	0.312	0.1405	0.4595
West Indies	0	0	0	0.743	0.743

The potential group results look like this:

Group A

NZ Aus SL Ban	0.9725
NZ SL Aus Ban	0.024
NZ Aus Ban SL	0.0035

Group B

Ind SA Pak WI	0.5295
Ind SA Ire WI	0.1985
Ind SA Pak Ire	0.1345
Ind SA Ire Pak	0.1135
Ind Pak SA WI	0.011
Ind Pak SA Ire	0.006
Ind Ire SA WI	0.004
Ind Ire SA Pak	0.003

The three interesting potential quarter final match-ups to watch for here are

SA vs Aus	4.7%
Ind vs SL	0.35%
Ire vs Ban	0.02%

In reality the probabilities of Ireland vs Bangladesh and Australia vs South Africa are higher, as they are both much more likely if rain starts to fall.

Sunday, 8 March 2015

World-cup quarter finals simulation

After Pakistan's tremendous win over South Africa, and Ireland's remarkable victory over Zimbabwe, the make up of the quarter finals is not really much clearer.

They question as to who is likely to be going through, and who will play whom has been the subject of many, many twitter conversations.

I thought it might be helpful to run a simulation to look at some of the possibilities.

I used Microsoft Excel as it's quite convenient. I used the scores already made in this tournament to decide the probable scores. For each team I got their average rpo scored in relation to the overall group run rate, and their average conceded in relation to the overall. Hence if a team in group A averaged scoring 5.5 rpo and conceded 5.3 rpo, they got values of +0.4 for batting and +0.2 for bowling (as the average rpo in group A has been 5.1 so far). From that point I then used an inverse normal, with a random number between 0 and 1 for the area, the group run rate plus the batting run rate modifier and the other team's bowling run rate modifier as the mean. For the standard deviation, I used the smallest of one third of the mean and 1.6. This allowed me to make sure there was (almost) no chance of a team getting a negative score, but that the scores weren't going to blow out too much. I used 1.6 as that's the standard deviation of all innings run rates this tournament.. This gave me a 50 over score for each team, and so which ever was ahead got the points for the win.

There are a few limitations with this method. I didn't take into account the quality of the teams that each side had faced. England has played Australia, New Zealand and Sri Lanka, but has yet to play Bangladesh or Afghanistan. Their numbers are not going to necessarily show how well they will do against less fancied opponents. Likewise no adjustments were made for the pitch that the match is being played on. We know that South Africa have tended to favour playing on bouncier tracks, so an innings at the 'Gaba won't necessarily tell us much about how they would go in Dunedin. I also haven't taken into account player strengths. Bangladesh's batsmen tend to struggle against tall bowlers, such as Finn and Woakes. England can expect that those two bowlers will perform better than average against Bangladesh, and hence their team is likely to do better than the numbers would suggest.

Another major limitation is that I haven't made provision for rain. That would obviously throw off all calculations. However, given the limited information I felt that a more simple model was best.

I decided to do 2000 trials, so that I could feel that the major source of uncertainly was the assumptions rather than the natural sampling variability.

First I found the probability of the different teams making the quarter finals with my simulation:

Team	Probabiity
New Zealand	100%
Australia	100%
Sri Lanka	99.95%
Bangladesh	82.51%
England	17.54%
-	-
India	100%
South Africa	100%
Pakistan	74.71%
Ireland	61.82%
West Indies	63.47%

We can see that Pool A has one crucial match (England vs Bangladesh)
Pool B, however, is still wide open. Ireland vs Pakistan is the last game of the round robin, and it's shaping up to potentially be one that has 3 team's fortunes riding on the result.

If West Indies make the final 8, they will almost definitely face New Zealand. It's very unlikely that New Zealand will not end up on top of Pool A, and impossible that West Indies will end up 3rd or higher in pool B.

Here's the full results for all possible matchups

Pool A	Pool B	Probability
New Zealand	Pakistan	14.99%
New Zealand	South Africa	0.35%
New Zealand	Ireland	21.23%
New Zealand	West Indies	63.44%
Australia	India	2.30%
Australia	Pakistan	43.11%
Australia	South Africa	27.57%
Australia	Ireland	27.02%
Sri Lanka	India	18.18%
Sri Lanka	Pakistan	15.83%
Sri Lanka	South Africa	53.75%
Sri Lanka	Ireland	12.19%
Bangladesh	India	64.74%
Bangladesh	Pakistan	0.75%
Bangladesh	South Africa	15.88%
Bangladesh	Ireland	1.15%
England	India	14.79%
England	Pakistan	0.05%
England	South Africa	2.45%
England	Ireland	0.25%

I'll redo this after tomorrow's results, and then again on Monday.

The most likely scenario at the moment is India to play Bangladesh, Australia to play Pakistan, South Africa to play Sri Lanka and New Zealand to play West Indies.

I've updated this here

Sunday, 22 February 2015

A quick look at the DRS rule with hawkeye and lbw

There is a significant issue with the way that hawkeye is used for DRS.

There is some doubt as to the exact position of the ball when captured on camera. It's only accurate to the nearest 2mm or so. While that's very accurate, once it's used to create a model, it can be dangerous. As a result there is a margin for error. Then there can be difficulty determining exactly where the ball hits the pad, especially where it brushes the front pad on the way to the second. This means that there is some doubt as to what the actual position of the ball is.

To overcome this, the ICC have ruled that more than half of the ball needs to hit the centre of the wicket. This is a user friendly option at first glance. The boundary is really clear, and the batsman needs to be clearly out in order to be given out. But near the boundaries there are occasionally situations where the ball is clearly going to hit, but instead the hawkeye system calls the ball "umpires call."

This is particularly ridiculous when the ball has hit the batsman on the back foot. In a situation where the ball has only an extra 40cm to travel, if the middle of the ball is just outside the middle of the stump then for the ball to miss the stumps, then the model would have to be out by 5.5 cm. On a distance of travel of 40cm that's allowing way too much margin for error (realistically there would be a significantly less than 1% chance of the ball missing the stumps).

A solution would be to look at a cone that was using a realistic model for the uncertainty. That would be more sensible for the commentators, fans and players to understand, and would actually provide a more sensible answer to the question "would the ball hit the stumps?"

I've put together a short video to demonstrate what I mean as well.

Sunday, 15 February 2015

South Africa vs Zimbabwe - things to watch for

Here are 5 things I'm going to be watching for in this match.

1. I really enjoy watching Elton Chigumbura. He's the sort of player who plays to win the match, rather than playing to have a good average. He gives himself the difficult jobs, and then puts everything into them.

2. Will Zimbabwe get Amla early. The Zimbabwean attack is quite suited to most New Zealand grounds, but if they don't get Amla early, then they will struggle to get him at all.

3. Quinton de Kock - can he rein his game in against the slower paced (but subtle and tricky) opening attack of Zimbabwe.

4. Brendan Taylor - Has he regained his touch that made him one of the best batsmen in the world in 2011.

5. South Africa's movement off the ball in the field. De Villiers has made it clear that he wants to see his team moving around more off the ball, like the New Zealand players do. This will be a chance to see if his talk has worked.

Tuesday, 3 February 2015

Martin Guptill and the form myth

Every season there seems to be a cause célèbre among NZ cricket fans. In 2013 the call was that Brendon McCullum wasn't scoring enough runs, and needed to be dropped. In 2013-14 it was that Peter Fulton wasn't scoring enough runs and needed to be dropped. This season the overwhelming majority of cricket talk in New Zealand has been about one man: Martin Guptill. Apparently he isn't scoring enough runs and needs to be dropped.

In either calls to Radio Sport or comments on the Vietchy On Sport facebook page there have been at least 21 players suggested as being a better option as an opener than Martin Guptill. People have suggested different ways that he might get injured in order to get him replaced in the squad.

But the opinion that Guptill's significantly out if form is not just confined to the uninformed public (I consider anyone that suggest Michael Pollard, Peter Ingram or Kyle Mills as replacements for Guptill uninformed). There have been a number of the country's sports journalists join in. In a quite well written and balanced piece, Andrew Alderson noted that Guptill "struggled for form." Charlie Bristow talked of Mike Hesson needing "to handle Martin Guptill's stuttering form." Mark Geenty commented that the top order was carrying "significance and concern." Guy Heveldt said that Guptill is "under immense pressure to find some form before the World Cup begins." Daniel Richardson said that Guptill is "out of touch", "has done little to inspire confidence" and that his "form is a concern."

David Warner vs Rohit Sharma

Over the past couple of days I've been called a troll, a Jonathan Agnew fan and even an Australia on twitter, because I have a position that is somewhat different from others on the David Warner vs Rohit Sharma incident. The problem is that a nuanced view doesn't fit neatly inside a 140 character window, and so my views have been missinterpreted. Part of that is because people seem to have very absolute views on the matter, when I don't think what happened is really very black and white.

First of all I'll talk about my system of ethics with sledging and other play, and what I consider acceptable, then I'll look at the Warner-Sharma confrontation specifically.

Sledging is an attempt to get a psychological advantage over another player. For me this is part of the game. However, there are limits to what is acceptable. Some examples of forms that are acceptable (in my opinion) are fielders encouraging the bowler in a way that the batsman can hear and that might get into a batsman's head. For example "That's 4 dot balls in a row now" "He's got no idea about the short one" "Look at how he's holding the bat with his bottom hand, I reckon his coach will have words with him about that afterwards. It's causing him to push the bottom of the bat in. I reckon a half volley outside off will see him nick out here." These comments make the batsman doubt either their technique or their form, and can cause them to play false shots.

Likewise batting advice to the batsman is acceptable, even if it's not always genuine. The below example (about 1:20 in) where Hadlee gives Botham some advice on how to play his bowling is a classic. Botham may well have been late on the shot because he was thinking about what Hadlee had said and had anticipated a different delivery.

I f the fielding side feel that the batsmen are doing something underhanded, such at taking a run when the ball was dead, they are entitled to express their displeasure to them.

The more interesting questions are what is unacceptable. Here is my list:

Threats of violence that don't involve the playing of the game. For example "I'm going to break your ribs with the next ball" is acceptable. Likewise "If those close fielders stay there, I'm going to still play my shots and they will get hurt." Both of these, however, need to be in context. A bowler/batsman shouldn't be randomly threatening violence willy-nilly, but in the heat of an exchange they are fine. "I'll see you in the car park afterwards and smash your face in," however is not acceptable.

Racial slurs are not acceptable. They are not acceptable directed at a player or spoken about a player. There's a story about some things that Shane Thompson said to Wasim Akram and Waqar Younis to try and goad them into bowling short at him (rather than yorkers) that are totally unacceptable things to have been said on a cricket pitch.

Abuse for the sake of it is unacceptable. This includes most (but not all) send-offs. There can be time for a witty send off, provided it is brief and concludes an ongoing conversation. Prolonged send-offs, especially abusive ones, are completely unacceptable.

Likewise abusing someone to get under their skin, without there being any relation to the game or without it being in the context of an ongoing conversation is not on. The way that Fleming subjected Smith to a torrent of nastiness when he arrived at the crease may have helped New Zealand tie the series, but it was not something that New Zealand fans should be proud of.

There are other difficult situations, but generally it is fine to sledge, provided it is done in a way that has a purpose, and doesn't cross the line into pure abuse.

Now lets look at the Warner-Sharma situation. Here's my summary of what happened, as far as I understood it.

1. Rohit Sharma was slightly outside his ground, as he's entitled to be.
2. David Warner threw the ball towards the stumps.
3. The ball was very wide of the mark, and (only just) missed Sharma, and then evaded Haddin.
4. Sharma and Raina then proceeded to run an overthrow.
5. Warner thought that the ball had deflected off Sharma and got angry that they ran an overthrow contrary to established protocol.
6. Warner told Sharma that he was unimpressed
7. Sharma said something to Warner in Hindi. Warner speaks a few words of Hindi and didn't understand the full message but was upset by what he did understand.
8. Warner shouted at Sharma to speak English.
9. Sharma repeated his message in English as the umpires separated the players.

The one key point here is number 2. David Warner is a fantastic fielder. He has produced a few blinding run outs from direct hits. One of the impressive things about his fielding is just how often he hits the stumps. Given his ability, the fact that he missed the stumps by about 3m from close range is peculiar. The fact that he almost hit Sharma was concerning. How off target it was can be seen by the fact that Haddin stepped twice, then dived full length, and still didn't get to the ball.

He thought that he had hit Sharma with the throw, and that, therefore, Sharma shouldn't take a run. He didn't appologise for hitting Sharma, which would normally happen. It makes me wonder if he was aiming to hit Sharma with the throw. For me that is the key thing that was wrong with that incident.

What Warner said after that was in keeping with his understanding that Sharma had taken a run he would not normally be entitled to take. Sharma speaking Hindi successfully got in the head of Warner, and I don't have a particular problem with that. Warner's reaction, likewise, was totally understandable in context. The only issue, and it's a big one, was if Warner deliberately tried to hit Sharma with the ball.

If (in the opinion of the match referee) he did, then it would be a level 2 offence and he should be banned for a couple of games. Instead Warner was charged with a level 1 offence for "using language or a gesture that is obscene, offensive or insulting." As he had been found guilty of a similar offense within the past 12 months it was automatically raised to a level 2 offense, but he received the minimum fine for that offence, of 50% of his match fee.

The thing that I don't understand is how he was found guilty of that at all. As far as I can see he didn't abuse Sharma, and he didn't use any offensive gestures that I could see. If Sharma had spoken to him in English, then asking him to "speak English" would have been offensive, but given that Sharma didn't actually speak English, it was a perfectly reasonable request (despite not being delivered in a particularly reasonable manner). In the verbal altercation, Warner and Sharma acted equally badly, but not nearly badly enough for a charge.

If the ICC Code of Conduct was applied correctly here, either Warner would have been charged with deliberately throwing the ball at Rohit Sharma or he wouldn't have been charged at all.

Friday, 23 January 2015

Comparing between eras part 2. The survey results

In the previous post I looked at some New Zealand batsmen throughout the years and compared them, by trying to take into account some of the factors that might have batting either easier or harder for them.

I did this by looking at the runs that each player scored at a particular ground, and then looking at how easy/difficult that ground was to score at during that player's career. After that I allocated each ground a modifier value, and multiplied the runs scored at each ground by that ground's modifier. As a result (for example) the 188 runs that Martin Crowe scored at the Bourda in Georgetown were worth 164.5, because (during Crowe's era) it was a batting friendly pitch. However, his 120 runs that he scored at Karachi were worth 135.1 because that ground favoured bowlers.

I wanted to try the technique across a wider range of batsmen, so I put a simple request on twitter, for people to send me their top 5 batsmen. The tweets started pouring in.

I need some crowd-sourcing help. I want a lot of top of head lists of the 5 best batsmen ever. Please reply with your 5 and then RT. Thanks!
— Michael Wagener (@Mykuhl) January 7, 2015

I received a few humerous replies such as 5 votes for Rohit Sharma, 5 votes for Graham Thorpe and my personal favourite:

@Mykuhl @TheCricketGeek Adam West Michael Keaton Del Boy Val Kilmer George Clooney
— Adrian Toomey (@adriantoomey) January 7, 2015

But eventually I had 159 serious lists of 5.

From the top 20 (plus ties) I then worked out their Normalised Averages. I left out two players, Barry Richards and WG Grace, as neither of their test careers were really the reason that people put them in the list. For both, test matches made up less than 5% of their first class career. I'll deal with them (and Charles Bannerman) in a future post.

Here's the list:

Rank	Name	Votes	Average	Norm Average
1	Don Bradman	119	99.94	101.03
2	Sachin Tendulkar	112	53.79	54.10
3	Brian Lara	108	52.89	54.41
4	Viv Richards	84	50.24	54.96
5	Ricky Ponting	55	51.85	52.50
6	Kumar Sangakkara	52	58.45	58.27
7	Gary Sobers	31	57.78	57.71
8	Rahul Dravid	28	52.31	52.73
9	Jacques Kallis	27	55.37	59.55
10	Jack Hobbs	24	56.95	63.01
11	Barry Richards	12	72.57	*
11	Wally Hammond	12	58.46	58.44
13	AB de Villiers	11	52.10	52.99
13	Steve Waugh	11	51.06	53.56
15	WG Grace	10	32.29	*
16	Graeme Pollock	9	60.97	59.91
16	Sunil Gavaskar	9	51.12	54.76
18	Herbert Sutcliffe	4	60.73	62.00
18	Dennis Compton	4	50.06	53.44
18	Martin Crowe	4	45.37	47.91
18	Adam Gilchrist	4	47.61	49.24
18	Allan Border	4	49.54	54.30

There are a couple of interesting things here. Less than 3/4 of people picked Bradman. Often they said that it was because they had never watched him bat, and that's understandable, but I would have thought his extraordinary average alone was sufficient to put him in the mix. You don't need to know much about batting averages to know that Bradman's numbers are almost unbelievable.

The tendency to only vote for batsmen that people had seen meant that players who had played since 2000 had to score at a lower average than players who had played before that. Here's a graph comparing the number of votes that a batsmen got with their normalised average:

There was also a tendency for people to nominate players who had done well against their sides. Most votes out of England included Brian Lara who hit both of hit triple centuries against England, while votes from India often included Ricky Ponting who averaged mid fifties against the Indians.

Here's the list ordered by their Normalised Average. I've added in two other older players who only got one vote each, Ken Barrington and Everton Weekes but who both had exceptional records.

Name	Average	Norm Average
Don Bradman	99.94	101.03
Ken Barrington	58.67	64.00
Jack Hobbs	56.95	63.01
Herbert Sutcliffe	60.73	62.00
Graeme Pollock	60.97	59.91
Jacques Kallis	55.37	59.55
Everton Weekes	59.46	59.39
Wally Hammond	58.46	58.44
Kumar Sangakkara	58.45	58.27
Gary Sobers	57.78	57.71
Viv Richards	50.24	54.96
Sunil Gavaskar	51.12	54.76
Brian Lara	52.89	54.41
Allan Border	49.54	54.30
Sachin Tendulkar	53.79	54.10
Steve Waugh	51.06	53.56
Dennis Compton	50.06	53.44
AB de Villiers	52.10	52.99
Rahul Dravid	52.31	52.73
Ricky Ponting	51.85	52.50
Adam Gilchrist	47.61	49.24
Martin Crowe	45.37	47.91

A couple of interesting things here are the way that players are rewarded for scoring on the harder pitches. Sutcliffe and Hobbs played together through a large part of their careers. But Hobbs was the one that scored the most runs when the conditions were the hardest for batting. As a result Hobbs' average increased by 6.06 while Sutcliffe's only increased by 1.27.

Jacques Kallis likewise scored a lot of runs at Newlands, which has been a graveyard for batsmen, and he has been rewarded for that. Kumar Sangakkara however, has scored a lot of his at the SSC, which is a place that batsmen have prospered, and so that saw his normalised average end up lower than his actual average.

I still have a number of players that I'd like to look at such as Victor Trumper, Bruce Mitchell, Zaheer Abbas and Andy Flower. But there's plenty of time for that in the next installment.

Sunday, 11 January 2015

Comparing between Eras, Part 1, Crowe, Sutcliffe, Turner and Williamson.

When you write a stats piece, it's not uncommon to get criticism. It's also not unwelcome. I think my statistics have become better, and my writing has improved through interaction that I've had with readers.

Some criticism is not so welcome or useful. The person that said that he was going to "come to England and burn my house down for being a bias English" for example. This gentleman was upset that I said that Matt Prior had a better batting average than MS Dhoni.

Then there's the type that makes me think. Just after the declaration in the second test in Wellington, I tweeted a comparison of a bunch of New Zealand batsmen after 71 test innings. I received an interesting reply from someone that knows a bit about batting.

Barry Richards, the great South African batsman tweeted me back saying that Williamson was a nice kid, but couldn't be compared with Crowe.

Over the next couple of days we sent tweets back and forth discussing the concept of comparing different players from different eras.

One of the points that he made is that statistics don't take into account the context of an innings. There are times that an innings of 30 can be a match winner, while other times an innings of 70 can be meaningless. This is true, and in part there is not a lot that can be done about this. The hope is that these sort of innings end up cancelling themselves out.

Another point that he made was that the pitches, outfields and bowling were all different. How do you compare a player who has played against Bangladesh and Zimbabwe to one that hasn't? Sabina Park in the 80's was a minefield. Sabina Park now is a featherbed. How can you compare someone who played on those different pitches. To hit a six used to require the ball to be hit over the fence. Now it just needs to hit the rope, that's placed 2 m inside the speakers that are 1 m inside the advertising hoardings that are 1 m inside the boundary.

Likewise it's easier to bat against some bowlers than others. You would expect a batsman playing against Garner, Holding, Marshall and Roberts to find things more difficult than one playing against Powell, Rampaul, Sammy and Bravo.

It's impossible to account for these factors perfectly. However, there are some ways that we can deal with these differences. In order to be able to answer some of Richards' issues I decided to look at the results for some of the New Zealand batsmen at each ground. Then I looked at the results of every top order batsman at that ground throughout each player's career. I decided to normalise to having an average of 40. So a ground where the batsman averaged 32, I would multiply any run scored at that ground by 1.25.

For example during Martin Crowe's career batsmen averaged 53.86 at Bulawayo Athletic Club. This means that any runs scored at that ground would only count for 0.743 of a run. Accordingly, Crowe's 48 runs there would count as 35.6 runs. There is an advantage here for any player who played for a team with a really good bowling attack, but it's hard to account for that completely.

I decided to look at Crowe, Turner, Sutcliffe and Williamson to see how Williamson compares to some of the greats. In the last post I showed that Williamson now has a higher test average than any of the greats of NZ cricket had, and had also scored more runs, at a higher average than they had after the same number of innings.

You can click on a players name to see their stats, or skip the individual grounds by clicking on table:
Crowe
Sutcliffe
Turner
Williamson
Table

Martin Crowe:

Ground	Actual Runs	Factor	Mod. Runs
Adelaide Oval - Australia	145	0.951	137.9
AMI Stadium, Christchurch - New Zealand	389	1.352	526
Asgiriya Stadium, Kandy - Sri Lanka	34	1.433	48.7
Barabati Stadium, Cuttack - India	15	1.506	22.6
Basin Reserve, Wellington - New Zealand	1123	0.908	1019.3
Bourda, Georgetown, Guyana - West Indies	188	0.875	164.5
Brisbane Cricket Ground, Brisbane - Australia	278	1.088	302.4
Bulawayo Athletic Club - Zimbabwe	48	0.743	35.6
Carisbrook, Dunedin - New Zealand	141	1.114	157
Colombo Cricket Club Ground - Sri Lanka	72	1.202	86.5
Eden Park, Auckland - New Zealand	712	1.1	783.5
Edgbaston, Birmingham - England	36	1.126	40.5
Gaddafi Stadium, Lahore - Pakistan	216	1.065	230.1
Harare Sports Club - Zimbabwe	201	1.076	216.2
Headingley, Leeds - England	38	1.14	43.3
Iqbal Stadium, Faisalabad - Pakistan	41	1.061	43.5
Kennington Oval, London - England	46	0.948	43.6
Kensington Oval, Bridgetown, Barbados - West Indies	16	1.113	17.8
Kingsmead, Durban - South Africa	28	1.139	31.9
Lord's, London - England	327	1.006	329.1
M.Chinnaswamy Stadium, Bangalore - India	35	1.311	45.9
Melbourne Cricket Ground - Australia	161	1.139	183.4
National Stadium, Karachi - Pakistan	120	1.126	135.1
Newlands, Cape Town - South Africa	23	1.236	28.4
Niaz Stadium, Hyderabad - Pakistan	40	0.873	34.9
Old Trafford, Manchester - England	185	0.996	184.2
Queen's Park Oval, Trinidad - West Indies	5	1.411	7.1
Sabina Park, Jamaica - West Indies	7	1.138	8
Seddon Park, Hamilton - New Zealand	36	1.164	41.9
Sinhalese Sports Club Ground, Colombo - Sri Lanka	126	1.094	137.8
Sydney Cricket Ground - Australia	8	0.95	7.6
The Wanderers Stadium, Johannesburg - South Africa	83	1.319	109.5
Trent Bridge, Nottingham - England	213	1.003	213.6
Tyronne Fernando Stadium, Moratuwa - Sri Lanka	30	1.211	36.3
W.A.C.A. Ground, Perth - Australia	278	1.064	295.7

Crowe's actual average was 45.37. His modified average is 47.91

Bert Sutcliffe:

Ground	Runs	Factor	Mod. Runs
AMI Stadium, Christchurch - New Zealand	303	1.25	378.6
Bagh-e-Jinnah, Lahore - Pakistan	29	1.194	34.6
Bangabandhu National Stadium, Dhaka - Pakistan	20	1.532	30.6
Basin Reserve, Wellington - New Zealand	126	1.508	190
Brabourne Stadium, Mumbai - India	115	0.972	111.8
Carisbrook, Dunedin - New Zealand	166	2.014	334.3
Eden Gardens, Kolkata - India	187	1.119	209.2
Eden Park, Auckland - New Zealand	198	1.329	263.1
Edgbaston, Birmingham - England	57	1.085	61.9
Ellis Park, Johannesburg - South Africa	113	1.074	121.3
Feroz Shah Kotla, Delhi - India	286	0.805	230.1
Gaddafi Stadium, Lahore - Pakistan	23	0.965	22.2
Headingley, Leeds - England	120	1.09	130.8
Kennington Oval, London - England	171	1.177	201.3
Kingsmead, Durban - South Africa	36	1.279	46
Lal Bahadur Shastri Stadium, Hyderabad - India	154	0.533	82
Lord's, London - England	75	1.217	91.2
National Stadium, Karachi - Pakistan	63	1.186	74.7
Nehru Stadium, Madras - India	143	1.058	151.3
Newlands, Cape Town - South Africa	66	0.931	61.5
Old Trafford, Manchester - England	179	1.097	196.4
Pindi Club Ground, Rawalpindi - Pakistan	7	2.519	17.6
St George's Park, Port Elizabeth - South Africa	90	1.316	118.5

Sutcliffe's actual average was 40.10. His modified average is 46.46

Glenn Turner:

Ground	Runs	Factor	Mod. Runs
Adelaide Oval - Australia	54	1.03	55.6
AMI Stadium, Christchurch - New Zealand	664	1.094	726.3
Bangabandhu National Stadium, Dhaka - Pakistan	136	1.197	162.7
Basin Reserve, Wellington - New Zealand	349	1.241	433.1
Bourda, Georgetown, Guyana - West Indies	259	0.768	198.9
Brabourne Stadium, Mumbai - India	29	1.106	32.1
Carisbrook, Dunedin - New Zealand	61	1.414	86.2
Eden Park, Auckland - New Zealand	381	1.137	433.1
Gaddafi Stadium, Lahore - Pakistan	9	0.939	8.5
Green Park, Kanpur - India	148	0.91	134.7
Headingley, Leeds - England	92	1.403	129.1
Kennington Oval, London - England	78	0.996	77.7
Kensington Oval, Barbados - West Indies	21	1.03	21.6
Lal Bahadur Shastri Stadium, Hyderabad - India	17	2.928	49.8
Lord's, London - England	52	1.11	57.7
MA Chidambaram Stadium, Chennai - India	42	1.309	55
Melbourne Cricket Ground - Australia	6	1.083	6.5
Niaz Stadium, Hyderabad - Pakistan	51	0.832	42.4
Queen's Park Oval, Trinidad - West Indies	148	1.116	165.2
Sabina Park, Jamaica - West Indies	244	0.864	210.8
Trent Bridge, Nottingham - England	20	1.115	22.3
Vidarbha C.A. Ground, Nagpur - India	59	1.579	93.2
Wankhede Stadium, Mumbai - India	71	1.161	82.5

Turner's actual average was 44.64. His modified average is 49.03

Kane Williamson:

Ground	runs	modifier	modified runs
Basin Reserve, Wellington - New Zealand	684	0.973	665.5
Bellerive Oval, Hobart - Australia	53	1.179	62.5
Brisbane Cricket Ground, Brisbane - Australia	19	0.907	17.2
Dubai International Cricket Stadium - U.A.E.	43	0.999	42.9
Eden Park, Auckland - New Zealand	208	1.085	225.7
Galle International Stadium - Sri Lanka	10	0.968	9.7
Hagley Oval, Christchurch - New Zealand	85	0.941	80
Headingley, Leeds - England	16	1.064	17
Kensington Oval, Barbados - West Indies	204	1.353	276
Lord's, London - England	66	1.092	72.1
M.Chinnaswamy Stadium, Bangalore - India	30	1.149	34.5
McLean Park, Napier - New Zealand	4	1.859	7.4
Newlands, Cape Town - South Africa	28	0.984	27.6
P Sara Oval, Colombo - Sri Lanka	153	1.024	156.7
Queen's Park Oval, Trinidad - West Indies	94	0.997	93.7
Queens Sports Club, Bulawayo - Zimbabwe	117	0.885	103.6
Rajiv Gandhi International Stadium, Uppal, Hyderabad - India	157	0.892	140.1
Sabina Park, Jamaica - West Indies	145	1.503	217.9
Sardar Patel (Gujarat) Stadium, Motera, Ahmedabad - India	131	0.852	111.6
Seddon Park, Hamilton - New Zealand	242	1.362	329.6
Sharjah Cricket Stadium - U.A.E.	192	0.859	165
Sheikh Zayed Stadium, Abu Dhabi - U.A.E.	26	0.818	21.3
Shere Bangla National Stadium, Mirpur, Dhaka - Bangladesh	62	0.933	57.8
Sir Vivian Richards Stadium, Antigua - West Indies	19	0.77	14.6
St George's Park, Port Elizabeth - South Africa	15	1.051	15.8
University Oval, Dunedin - New Zealand	35	0.821	28.7
Vidarbha Cricket Association Stadium, Jamtha, Nagpur - India	8	0.974	7.8
Zahur Ahmed Chowdhury Stadium, Chittagong - Bangladesh	188	0.788	148.2

Williamson's average is currently 45.97. His modified average is 47.73.

Final table

Name	Average	Modified average
Sutcliffe	40.10	46.46
Turner	44.64	49.03
Crowe	45.37	47.91
Williamson	45.97	47.73

After doing this analysis it seems that Crowe and Turner are still ahead of Williamson. However, he's catching them quickly.

Once I started doing this for these batsmen, I started to wonder what would happen if we did this for all the greats. To help with that I did a quick crowd-source on twitter by asking people who they thought the 5 greatest batsmen ever were. In my next post on this topic, I'll look at how the likes of Sachin Tendulkar, Sir Donald Bradman, WG Grace, Bruce Mitchell and Sir Vivian Richards stack up, once this modification is made. Check back here to see the results of the survey and also the batsmen's numbers.

Wednesday, 7 January 2015

Mini-session Analysis, 2nd test, New Zealand vs Sri Lanka, Basin Reserve 2014/15

Here is the mini-session analysis for the second test between New Zealand and Sri Lanka at Basin Reserve, Wellington

A mini-session is (normally) half a session, either between the start of the session and the drinks break or the drinks break and the end of the session. Occasionally a long session will have 3 mini-sessions where it will be broken up with 2 drinks breaks.

The first time I met Brendon McCullum was when he and I were both on the same flight from Miami to St Kitts in 2012, and we both had to wait about 2 hours in for our flight. I introduced myself to him, and he graciously put down his golf magazine and chatted to me for about 30 minutes.

After the flight, while we were both waiting for our bags (which took forever), he found out that we were staying at the same hotel, and so he offered me a lift there, provided I didn't mind going via the training ground. On the way to the ground, he got a phone call from John Wright asking if he was likely to be able to play the next day. At this point he had been travelling for 37 hours, without much sleep, and certainly did not look like he was going to be good to play. He said as much to Wright and then found out that Taylor was injured, and Kane Williamson was going to be standing in as captain.

After he hung up, McCullum couldn't contain his disappointment with that decision. He said something along the lines of "it's not that he's not capable of leading the side, it's more that he's not established in the side yet, and he's still learning his game. He doesn't need the pressure of being the captain-in-waiting to go with it. The kid is so good. He's either going to be the best batsman we've produced since Martin Crowe, or the best batsman we've ever produced, and it's our job to manage him properly to allow him to be great."

Williamson scored 9 off 17, but New Zealand went on to win the match, due largely to a good spell from Southee and Oram, and some suicidal running between wickets from the West Indies.

However the claim that Williamson could be our best ever batsman has stuck with me. Yesterday he completed an epic (but certainly not chanceless) innings of 242* to take New Zealand from a position of some trouble to a being in the box seat.

Since my conversation with McCullum, Williamson has gone on to be quite impressive with the bat. In the past 2 and a half years he has averaged 50.28 in tests and 49.25 in ODI's. In the matches before McCullum's pronouncement he had averaged 36.05 in tests and 35.75 at a strike rate of 83.85 in ODI's.

He's only 24, and there's still plenty of time left, but if his career was to end today, he would still be considered one of New Zealand's best ever batsmen. Here are his numbers, first in tests:

Name	Innings	Runs	Average	Completed innings per hundred
KS Williamson	71	3034	45.96	7.33
LRPL Taylor	113	4631	45.40	8.50
MD Crowe	131	5444	45.36	7.06
MH Richardson	65	2776	44.77	15.50
GM Turner	73	2991	44.64	9.57
AH Jones	74	2922	44.27	9.43
B Sutcliffe	76	2727	40.10	13.60
SP Fleming	189	7172	40.06	19.89
BB McCullum	159	5870	38.87	13.73
CD McMillan	91	3116	38.46	13.50
JG Wright	148	5334	37.82	11.75
BJ Watling	49	1578	37.57	10.50
JV Coney	85	2668	37.57	23.67
NJ Astle	137	4702	37.02	11.55
JDP Oram	59	1780	36.32	9.80

There is an important omission here, JF Reid only just missed out on this table, as he only played 19 tests, for an average of 46 and centuries per completed inning of 4.67. However I decided to make the cut off 20 tests to allow for diversity in conditions, and a sufficient sample size, and then noticed that Reid had missed out after that.

Ranked by completed innings per hundred, Crowe is still slightly ahead of Williamson, but if we look at the time since my conversation with McCullum his conversion rate has come down closer to 6.5.

In ODI's batting average isn't always the best way to measure success. Someone that scores 60 off 100 balls every match will have a fantastic average, but they will not be helping their team as much as someone who scores 45 off 40 every match. To counter that some statisticians use a statistic called batting index to measure success. This is basically the average multiplied by the strike rate then divided by 100. If we sort by batting average, Glen Turner is ahead, and Williamson second, but by batting index, Williamson pulls clear. Here are the top 15:

Name	Runs	Average	Strike Rate	Batting Index
KS Williamson	2045	43.51	80.67	35.10
LRPL Taylor	4580	41.26	82.55	34.06
L Ronchi	557	29.31	109.43	32.07
GM Turner	1598	47.00	68.05	31.98
JD Ryder	1362	33.21	95.31	31.65
MJ Guptill	2953	37.85	80.20	30.36
RG Twose	2717	38.81	75.40	29.26
MD Crowe	4704	38.55	72.63	28.00
BB McCullum	5200	30.05	90.56	27.21
SB Styris	4483	32.48	79.41	25.79
NJ Astle	7090	34.92	72.64	25.37
CL Cairns	4881	29.22	83.76	24.47
PG Fulton	1334	32.53	72.77	23.67
RJ Nicol	586	30.84	75.51	23.29
SP Fleming	8007	32.41	71.40	23.14

Sometimes it can be dangerous to look at a players's career based on their final average. One of the all time greats in my mind was Sir Frank Worrell who averaged 61 before his 30th birthday, but then he was convinced to come back as a specialist captain, and he averaged 41 after that. As a result it can be better to look at batsmen at the same stage of their careers.

Here's a list of the NZ players who went on to average over 40 after 71 innings:

Name	Runs	Average
Williamson	3034	45.97
Jones	2910	45.47
Turner	2952	45.42
Crowe	2948	45.35
Taylor	2868	42.81
Sutcliffe	2616	40.25
Fleming	2476	36.41

Again we see Williamson at the top. However, this time it is not so clear. If he had been dismissed for 30 instead of 242* in his last innings, he would have been on 2822 runs and his average would have been 42.12 which would have put him 5th in the list on both counts. A better picture might be to look at a graph of the career progression around that mark.

Here's the runs scored and averages of the 8 top averaging batsmen in their 60th to 80th innings:

We can see the impact of the last innings, but we can also see that there is a trend here that this innings was just a part of.

It is still too early to say that Williamson is our best batsman since Crowe, let alone that he is our best ever, but he is certainly in the conversation.

CricketGeek