An email dropped in my inbox on Friday from New Zealand cricket naming the squads for the test series with Australia and the NZA squads to play Sri Lanka A. There was a lot to talk about, with new players being named in the A squad and the prospect of a day-night test coming up. But over 1/4 of the press release focused on two players: Jimmy Neesham and Corey Anderson. When I got in the car to come home from work, I listened to a debate between Darcy Waldegrave and Goran Paladin about who should be picked, Anderson or Neesham.

A quick scroll through the different websites I check for cricket news found that this was the story that most writers picked up on:

Cricbuzz put down their Indiacentric blinkers to lead (briefly) with

"James Neesham, Corey Anderson named in Test squad for Australia Tour".

Andrew Alderson's piece in the New Zealand Herald was

Anderson v Neesham: Let the contest begin.

David Leggatt wrote a piece for the Otago Daily Times titled

Cricket: Black Cap selectors face all-rounder quandry.

Mark Geenty went for the slightly more negative

Concerns remain over Anderson, Neesham as recovery race heats up for Brisbane on stuff.co.nz.

Wisden India went for

Anderson, Neesham return for Australia Tests.

It was clear that this was the key talking point for most people.

And that's fair enough too. It's not often that a team has a genuine all rounder. To have two players who have the potential to develop into such players is remarkable. Given that neither are quite there as being both first choice batsmen and bowlers, to pick them both is unlikely, so a show down is likely.

Neesham is probably less aggressive than Anderson with the bat, which has led to him having more success in the few tests that he's played in. Anderson has really made his name in ODI cricket. Neesham, on the other hand, didn't make New Zealand's 15 man squad for the World Cup. With the ball, Anderson has been very effective in ODI cricket, but has not really performed as well in tests. Neesham has not taken a lot of test wickets, but has been quite effective at holding down an end. Anderson tends to rely on bounce and his left-arm angle, while Neesham has a good cutter, and tends to attack the batsman's body more.

There is quite a bit of debate about who is the better player, and so the prospect of them both being fit, and us seeing who Hesson opts for is tantalizing.

In the conversation on the radio, Waldegrave and Paladin both said that it was clear that Neesham had better statistics, and so he should be picked. My ears immediately pricked up.

The basic statistics do bear that out.

Batting | Anderson | Neesham |

Innings | 18 | 15 |

Runs | 533 | 606 |

Average | 31.35 | 43.28 |

100s | 1 | 2 |

Bowling | Anderson | Neesham |

Overs | 174 | 109.5 |

Runs | 500 | 361 |

Wickets | 13 | 11 |

Average | 38.46 | 32.81 |

Neesham has the better batting and bowling average. He's scored more runs in less innings, and has taken roughly the same number of wickets in roughly half the innings.

Here's a graphical representation of their batting scores so far:

A quarter of Anderson's innings have been scores of 2 or less, which is certainly not ideal. Neesham, on the other hand, has a quarter of his at 78 or higher.

Bowling innings are not so easy to show in a graph, but I felt that it was useful to see the difference. I've graphed their average vs the number of overs bowled.

We can see the trends in the numbers - Anderson's average is increasing, while Neesham's is decreasing. When Anderson had bowled the same number of overs as Neesham has now, their averages were similar, but Anderson's averages have risen fairly steadily since then.

However, straight summary stats can be misleading. I was interested to see if Neesham truly did have better statistics to the point where we could be confident that he would perform better.

I'm finding that more and more I distrust cricket basic statistics to tell me about players. That is an odd thing for a stats blog to say, but please hear me out.

Firstly a batsman's previous innings is not actually the full list of what he was capable of doing. It is effectively a sample. Of all the times that he could have played, he only actually played a few of them. (They have both batted on about 40 days, over the space of 3 years). Treating their previous results as population data, where we can compare summary statistics directly is dangerous, because these are effectively actually a sample of what their careers will eventually be. (Assuming here that they will play more). They are also only a sample of the scores that they were capable of throughout their careers. Perhaps they would have scored more if the last series they played in had been longer, or if there was an extra tests added into the last tour that they were on.

When we compare samples, we need to use statistical techniques in order to be able to account for sampling variation. Sampling variation is basically caused by not having enough information. There are a range of techniques to do this. If we have reason to believe that our population is normally distributed, we can create confidence intervals using descriptive statistics. However, we know that cricket scores are not normally distributed. Scores tend to be skewed to the right - ie the majority of scores are below the average. (Some examples Graham Dowling scored less than his average in 68% of his innings, Graham Smith scored below his average in 73% of innings, Gordon Greenidge scored less than his average in 68% of his innings and Don Bradman scored less than his average in 64% of his innings - if scores were normally distributed, then most players would score their average in roughly half their innings).

Another technique that can be used is a technique called bootstrapping. This is where a confidence interval is created by resampling with replacement. This is almost black magic, in that it uses just the variation in the sample to describe the variation in the population, and, despite it seeming illogical at first, it actually tends to work remarkably well. (For example, the bootstrap confidence intervals for the first 25 innings for Graham Smith, Sir Don Bradman and Gordon Greenidge all include their final career average. It even worked for Sir Frank Worrell, who had an amazing start to his career followed by a poor end)

The easy interval to construct was the batting scores. Here I randomly selected their batting innings, and calculated the average of each batsman. Then I subtracted Anderson's resampled average from Neesham's resampled average. If a number came out positive, then it meant that Neesham's average was higher, if it was negative, it meant Anderson's was higher. After taking 1000 resamples, I then looked at the central 95%. If it is all positive or all negative, it implies that there is a true statistical difference.

Here's the graph of the results

The red line in this graph indicates the confidence interval. Here we can see that the interval includes both positive and negative numbers. This means that we cannot make a call based on the start of their careers as to who is statistically the best. They are too close to call.

Bowling is harder to compare. There are so many things to compare that it can be really difficult. The way that I chose to compare the bowling was to think about what the job is that they are going to be asked to do. in the media conference Mike Hesson was actually quite clear about what role he expected Neesham or Anderson to do. They were to be an additional support bowler, in the same way that they have been used throughout the recent games. I looked at all the matches since McCullum has been captain, and the median overs bowled by the 4th seamer (or 3rd seamer when 2 spinners were picked) was 11. (For this I ignored a few innings where McCullum bowled himself for an over or 2, but I included the innings where he actually bowled 2 full spells)

As a result I normalised each bowling innings by Neesham or Anderson to 11 overs. To do this I added on a percentage to the run rates for situations where a bowler had only bowled a few overs. This meant that 0/25 off 5 became 0/65 off 11 and 2/12 off 6.1 became 3/25 off 11. There are obviously issues with this, but I felt that it was fairer than any other method that I could think of.

After normalizing we can see that the runs distribution is similar, but Neesham took more wickets more often.

The bootstrap results looked like this:

Again there is not enough evidence to actually say who is statistically better.

A third way to look at it is to compare the contribution in individual matches. For this I selected batting innings and bowling innings randomly from each player. I added Neesham's batting to Anderson's bowling, then subtracted Neesham's bolwing and Anderson's batting. If the result was positive then Neesham had made the bigger impact, if it was negative, then it was Anderson.

The result of this was as follows:

Again, there is not enough evidence to make a call.

However, all this data is from some very small samples. This is why the confidence intervals are so wide. To try and make any sort of call from such a small sample is really sketchy. To be able to make a valid comparison, I needed more data. Accordingly, I decided to look at their first class records. This time they both had more than 50 innings, and so the data was a little more useful.

However, the results were similar, despite the intervals being smaller. In every case the outcome included both positive and negative numbers, meaning that we could not make a call statistically who was the better player.

What does this mean in the context of selection?

Quite simply it means that the selectors need to rely on what they notice, rather than on the statistics. Who do they think will be successful, given their experience in the game, and their intuition for knowing which players are likely to do well.

Selection is not an exact science. In this case there is not a compelling statistical argument for either Neesham or Anderson, and so it really should come down to who the selectors feel would be most effective on the pitches that they are playing on.

Statistics can tell you a lot of things. But it cannot tell you everything. It is a tool for finding patterns, rather than a crystal ball for divining the future perfectly.