Wednesday, 26 June 2019

World Cup simulation update - 26 June

Are the wheels falling off?

England have now got a 4 win, 3 loss record, and, with 2 difficult matches coming up, have a genuine chance of not going through to the semi-finals. They are still not relying on other results, but they're getting close to the point where they are.



There's been a significant change, with Australia going up, and England going down. England are now expected to get to 10 points. That might still be enough. But it also might not be.
England's ranking has now dropped well below India's, to the point where the expected probability of England winning against India has dropped by almost 10%. They're still ahead due to home advantage, but the difference is decreasing.
There's about a 15% chance that a tie-breaker (total wins or net run rate) will be required. This may count out Sri Lanka, who have had two rain affected matches, and so will probably be on fewer wins than anyone else with the same number of points.

We see a huge drop in the semi-final probability of England, and a resultant increase in Bangladesh, Pakistan and Sri Lanka. Australia have qualified now, and there are fewer options now for New Zealand to be knocked out also (only 35 out of 50000 trials saw New Zealand miss the semi-finals.)


The decrease in England, and increase in probability of lower ranked teams making the semi-finals has meant that there are a lot more semi-final combinations with more than a 0.5% chance of happening. West Indies vs New Zealand was an epic match in the pool play, and that's now a reasonable possibility for a semi-final. The ICC and Star Sports will be licking their lips at the prospect of the 8th most likely outcome - an India Pakistan semi-final would be absolute ratings gold.
This is the first time that England has dipped below India on the winning probability graph, but it's hard to win the final if you don't get out of the group stage.

Monday, 24 June 2019

World Cup Simulation update 24th June


 Here's the update after the South Africa vs Pakistan match

Firstly, this pushed Pakistan's ranking back above Bangladesh's ranking, although they are both so close that the match between them is now predicted as 50.2% to 49.8%.
 Looking at the expected points, Pakistan have now jumped ahead of Sri Lanka and Bangladesh.

It's looking fairly likely that 5th place will be on 9 or 10 points, while 4th will be on 10, 11 or 12 points.

My simulation only uses net run rate as the tie breaker. Accordingly, there's actually a slightly higher probability of Sri Lanka and Pakistan getting through than this shows, and a slightly lower chance of England and Bangladesh.

It's takes a lot of processor time to improve the simulation, and it's likely to be less than 1% difference, but I might have a go at improving it once we get to the last 5 matches.


England are still the overwhelming favourite to be the 4th team to go through. There were still 41 out of the 50000 trials where New Zealand hadn't made it. So nobody is guaranteed through just yet.


If you have semi-final tickets - this is who you're likely to see.

The probabilities for Bangladesh and Pakistan being so low here are understandable. They both have about a 5% chance of making the semi-final, but, given that they both have about a 1/3 chance of winning each match against the top teams, it gives them a roughly 0.5% chance of winning the tournament from here. However, if Bangladesh, Australia and Pakistan win the next 3 matches, that number will rise.

It's starting to look like England's style that is so effective in series may not be so effective in one off matches. It will be interesting to see if that trend continues.

Sunday, 23 June 2019

World Cup Simulation Update, 23 June

Here's the latest outputs from the simulation.

England's loss to Sri Lanka opened the door somewhat, but we can still be fairly confident in who the semi-finalists are.
 England's ranking has gone down, after two losses to fairly ordinary sides.
It's looking like 10 points will be the magic number. Roughly a 10% chance that we'll rely on a tie-breaker.

The average points expected certainly favour England on that count to be in fourth


Accordingly, they have a much higher chance of making it through.

What the likely match ups are. (Teams in alphabetical order, rather than placings)

England are still firm favourites by my model. Home advantage is massive.

Monday, 17 June 2019

World Cup simulation update

The group stage of the World Cup is now roughly half way through, and there are 4 clear favourites to be the semi-finalists.

Afghanistan is the first team to be eliminated (they may have a mathematical possibility, but they don't have a statistical one). At this point, Sri Lanka are not far behind.

The rankings of the teams have remained fairly consistent, suggesting that the extra weighting for world cup matches is about right.
The fact that almost all the teams seem to have gone up is due to them all being relative to Afghanistan. Afghanistan do not seem to be quite as good as they were seeming to be and so they have dropped, but as they are set to 0, it's pushed everyone else up slightly.

The semi-final probability is the most interesting. 

I personally feel that this is underestimating the chances of South Africa, but we will see as the tournament progresses.

The key point on this graph is match 5, where Bangladesh overcame South Africa. If South Africa had won that match, they would be on about 40% and New Zealand and Australia would both be a lot lower.

The simulation also puts out the points for 4th, 5th and the difference between them. This suggests at the moment that there's only a fairly low chance that net run rate will come into play. However, one more rained out match, or a Bangladesh upset of Australia, and this could change dramatically. This makes the expected lines to be 9 points for 5th place, and 11 points for 4th place.

So far of the teams that I've had as favourite to win, 14 out of the 17 have won. Given the probabilities that the models assigned them, that's slightly higher than I would have expected - I would have expected there to have been 4 upsets rather than 3, but it's still telling me that my model is working quite well. That may be due to teams not always playing their best combinations in every match between the world cup, adding extra uncertainty to the results than exist inside a world cup.

It will be interesting to see if it continues to have the same success rate after the cup is finished.

Finally, applying the same system to find the probable winner gets the following results:
England are still favourites, but India are not far behind them.

Sunday, 16 June 2019

India vs Pakistan statistical preview.

Here's a couple of little charts etc for today's match up



This suggests that 250 would be a quite defendable total. The par score here is much, much lower than on most grounds in England.

The ground is actually fairly well balanced between both pace and bat and spin and bat - but still favouring both types of bowler slightly.


Old Trafford is the black spot, the grey points are other grounds around the world. Spin and pace friendliness are calculated based on the success of different types of bowlers on those grounds, taking into account runs conceded, balls bowled as wickets taken.

Adding to the ground data all the matches where India has batted first and all the matches where Pakistan has batted second, brings this graph:


This suggests that, taking into account the teams, that a more normal curve applies. If India score under 200 they're unlikely to win, 250 is the 50/50 point and 300 is more like a 75% chance of defending.


Thursday, 6 June 2019

World cup simulation update

Just before the World Cup started I wrote a post about a simulation that I had written to find the teams chances of making the semi-finals and chances of winning.

I've spent quite a bit of time improving it over the past week or so, learning some new machine learning techniques to improve my rankings etc.

Below are 3 graphs that show the change in rankings, semi-final probability and win likelihood.



These rankings are all relative to Afghanistan (who are first in the alphabet) Afghanistan will always be on 0. Every other team will change around them. any team ranked lower than them will get a negative rating.
The new model gave New Zealand and South Africa lower chances of making it, and Bangladesh and Australia higher chances of making it. South Africa have dropped lower still, while West Indies and Bangladesh have made up ground.

New Zealand has moved ahead of South Africa into the 4th most likely to win, but both teams are still at fairly long odds.

Monday, 3 June 2019

Preview - Match 6 - England vs Pakistan - Trent Bridge

England and Pakistan return to the scene of a recent run-fest, but this time there is more on the line.

To say that Trent Bridge tends to be batting friendly is like saying that Elton John tends to play the piano. However, the pitches so far have not exactly been typical of the grounds, and this may prove to be another incident of that.

England start as heavy favourites - Bet365 have them at 82%, Google gives them a 79% chance of winning, and my model gives them 84%. But the favourites don't always win ODI matches.

Here's the historical first innings score chart:

Pakistan have a reputation as an unpredictable team, but the reality is that they are one of the more predictable sides. They very rarely beat teams that are better than them, and very rarely lose to teams that they are better than. England should win this one reasonably comfortably.

Sunday, 2 June 2019

Preview - World Cup group match 5 - South Africa vs Bangladesh - The Oval

Today Bangladesh get their campaign underway, and South Africa get a chance to bounce back from their early loss.

This is predicted to be a win for South Africa, but I think the betting market are overstating the difference between the teams. Most of the bookies have an implied chance of winning of about 76% for South Africa, but my early model had them at 67%, and after their loss to England and Bangladesh's recent series win over West Indies (which looks more impressive now that West Indies have turned out to be quite good), the gap has shortened to 66.6% for South Africa vs 33.4% for Bangladesh.

The Oval pitch is one where there has been variety of conditions recently, so it's hard to know what a good score is until both teams have batted. Here's the historical graph:


The numbers have all dropped down by 2 or 3 runs as a result of the last match. 

One thing that does not play to Bangladesh's advantage here is the pitch. This is probably the bounciest pitch in England, and is more like a South African pitch than a typical English pitch. Bangladesh, however, play on probably the lowest, slowest pitches in the world.

If this match was at Taunton or Old Trafford, then Bangladesh may well be favourites. But not at the Oval. I'd expect South Africa to do well here. If they don't, then the semi-finals suddenly look a very long way away indeed.

Saturday, 1 June 2019

Preview - World Cup group match 3 - New Zealand vs Sri Lanka

This match is at Sophia Gardens in Cardiff. It's likely to be cool and damp, but with no rain. That's likely to play into New Zealand's hands.

New Zealand are distinct favourites - Bet365 have them at 78%, Google has them at 79%, and my model has New Zealand at 81%. However, none of those are at 100%, and the match isn't played on paper - Sri Lanka are still capable of pulling out a big performance.

Sophia Gardens is an odd shape, similar to Eden Park in Auckland, so it's a shape that New Zealand should be comfortable with. However, New Zealand has a mixed record at the ground - it was host to the match where New Zealand famously lost to Bangladesh in the Champions Trophy. In the one previous match between the two sides there, New Zealand won by 1 wicket, only just managing to win despite bowling Sri Lanka out for 138.

Teams batting first have generally not done well at Sophia Gardens unless they get a very big score. It's likely that both teams will want to chase here.


Again a score of 290 would be below par based on historical data, but ICC events sometimes have the pitches in different conditions to normal matches, so there's a chance that a lower score might still be very competitive.

As with some of the other matches, one of the more interesting things here will be the selections. What combination of players will each team go for?

Whichever way it goes - matches at Cardiff have tended to be interesting, even when the teams have seemed to be mismatched on paper before hand, so this could be the first match that's actually interesting on the field as well as just in the lead up.

Friday, 31 May 2019

Preview - World Cup group match 2 - West Indies vs Pakistan

Today's match is at Trent Bridge, Nottingham.

If any ground in the world has taken over the mantle of "most batting friendly ground in the world" from the Antigua Recreational Ground in St Johns, it's Trent Bridge. The groundsman seems to have taken WG Grace's famous statement "they came to see me bat, not you bowl" to the next level. The pitch seems to have been designed to make batting as easy as possible.

As a result the par score here is quite high.

Score 290 here, and you're on the wrong side of recent history. In order to have a 75% chance of winning after batting first, your team needs to score 349.

If anywhere is going to see 500 achieved, it is likely to be either Nottingham or Southampton (which also has a bowler-hating groundsman).

The regression model that I used in the previous article gives Pakistan a 73% chance of coming out on top in this match. However, the West Indies have been looking better recently than they were a couple of years ago, and Pakistan (conversely) have been looking like they're at a low ebb. As a result, this match feels more like it could go either way.

Pakistan have a habit of lifting their game significantly when they get momentum, and, as a result, have had a very good record recently against all the teams who are not currently ranked in the top five. The West Indies will need to start well to avoid Pakistan getting on a roll. 

This match is an important fixture for both sides, as a loss here will mean that the losing team will need to beat at least two of the top five ranked teams if they are to progress to the semi-finals.

Thursday, 30 May 2019

A simulation to see who will win the World Cup


One of the main purposes of statistics is to help inform decisions. Cricket statistics are often used when deciding on selection of players, or (more often) arguments about who is the best at a particular aspect. They can help decide which strategies are best, what an equivalent score is in a reduced match (with a particular case of Duckworth Lewis Stern) or which teams should automatically qualify for the World Cup (David Kendix). They are often also used by bookmakers (both the reputable, legal variety and the more dubious underworld version) to set odds about who is going to win.

I decided to attempt to build a model to calculate the probability of each team winning, based on their previous form. This was going to allow me (hopefully) to predict the probabilities of each outcome of the world cup, by using a simulation. It didn’t prove to be as easy as I had hoped.

My first thought was to look at each team’s net run rate in each match, adjust for home advantage, and then average it out. That seemed sensible, and the first attempt at doing that looked like it would be perfect. Most teams (all except Zimbabwe) had roughly symmetrical net run rates, and they fitted a normal curve really well. The only problem was that Afghanistan was miles ahead of everyone else. The fact that they had mostly played lower quality opponents in the past 4 years meant that they had recorded a lot more convincing wins than anyone else.

This was clearly a problem. India and England both had negative net run rates, while Afghanistan, Bangladesh and West Indies were all expected to win most of their matches.

I then tried a different approach, based off David Kendix’s approach of using each result to adjust a ranking. But rather than having a ranking that was based off wins, I based it off net run rate. So if a team had an expected net run rate of 0.5, and another had an expected net run rate of 0.6, the first team would have an expected net run rate of -0.1 for their match. If they did better than that, they went up, and if they did worse than that, they went down.

However, I found that some results ended up having too much bearing. If I made it sensitive to a change in the results, it ended up changing way too much based off one big loss/win. England dropped almost a whole net run per over based on the series in the West Indies. So this was clearly not a good option.

Next, I decided to try using logistic regression, and seeing how that turned out. Logistic regression is a way of determining probabilities of events happening if there are only two outcomes. To do that, I removed every tie or match with no result, and set to work building the models.

My initial results were exciting. By just using the team, opposition and home/away status, I was able to predict the results of the previous three world cups quite accurately using the data from the preceding 4 years. (I could not go back further than that, as they included teams making their ODI debut, and there was accordingly no data to use to build the model.

The results were really pleasing. I graphed them here, grouped to the nearest 0.2 (ie the point at 0.6 represents all matches that the model gave between 0.5 and 0.7 as the chance for a team to win), compared to the actual result for that match. It seems that they slightly overstate the chance of an upset (possibly due to upsets being more common outside world cups, where players tend to be rested against smaller nations), but overall they were fairly reliable, and (most importantly) the team that the model predicted would win, generally won.

I could then use this to give a ranking of each team that directly related to their likelihood of winning against each other. The model gave everything in relation to Afghanistan, with the being 0, and any number higher than 0 being how much more likely a team was to win against the same opponent as Afghanistan. (Afghanistan was the reference simply because they were first in the alphabet).







This turns out to be fairly close to the ICC rankings. So that was encouraging.

I tried adding a number of things to the model (ground types, continents, interactions, weighting the more recent matches more highly) but the added complexity did not result in better predictions when I tested them, so I stuck to a fairly simple model, only really controlling for home advantage.
Next I applied the probabilities to every match and found the probabilities of each team making the semi-finals.


The next step was to then extend the simulation past the group stage, and find the winner.

After running through the simulation a few more times, I came out with this:


A couple of points to remember here: every simulation is an estimate. The model is almost certainly going to estimate the probabilities incorrectly, but it will get them close, and they will be close enough to give a good estimate of the actual final probabilities. It is also likely to overstate Bangladesh’s ability due to their incredible home record; overstate Pakistan’s ability as a lot of neutral matches for them they have had a degree of home advantage in UAE; and understate West Indies, due to them having not played their best players in a lot of matches in the past 4 years. But these are not likely to make a massive difference to the semi-finalist predictions.



Given this, I’d suggest that if you are wanting to bet on the winner of the world cup, these are the odds that I would consider fair for each team:


I will try to update these probabilities periodically throughout the world cup, and report on their accuracy.

Saturday, 23 March 2019

A new way to look at bowling economy rates for the IPL

Sunrisers Hyderabad had made a great start, but their innings had started to plateau. At 161/7 off 18 overs they had the opportunity to get a score of 190+, or, if things went really poorly 175. Andre Russell was running into bowl...

He bowled a very good over, removing Braithwaite before only conceding 7 in his final 5 balls. All thoughts of a big finish were gone.

A week later, the Sunrisers were in the qualification final, and things were not going well. After 8 overs they were on 54/4, going at less than 7 an over, and at serious risk of scoring less than 100.

Dwayne Bravo was the bowler this time. He bowled a wide, then a couple of deliveries that Yusuf Pathan managed to hit for 2 each, and ended with a couple of easy singles. It was an over where almost no pressure was put onto the batsmen. And yet, it only went for 7 runs, the same as Andre Russell's excellent over a week earlier.

There's something wrong with any statistic that rates those overs as being of the same value to the team, and yet that's exactly what the traditional Economy Rate does. 7 runs is 7 runs.

Wednesday, 2 January 2019

Paine vs Pant

The Instagram photo. 
I wanted to quickly share my thoughts about the Paine - Pant sledge.

I've been an outspoken critic of "mental disintegration" -- the tactic of using personal abuse and insults to get under a player's skin and put them off their game, but I really liked what I heard from Paine, and think it's the sort of sledging that is totally appropriate.