Showing posts with label simulation. Show all posts
Showing posts with label simulation. Show all posts

Friday, 5 July 2019

World Cup Simulation update - 5 July

Here's the latest update for the world cup simulation. I have New Zealand at 100%, but that's simply due to the probability of Pakistan getting the required run-rate being so low that that possibility never eventuated in the 50000 trials that I used. The probability of Pakistan going through is slightly lower than the probability of someone being shot accidentally by a dog running along a beach while holding a handgun in it's mouth during the next week,
The next graph is the expected points. The simulation has had the correct top 4 from the second match on, however, the expected points and the order of the teams have changed considerably

The top 4 was looking fairly likely from about match number 6 on. There was some excitement from the two upset losses by England, but Pakistan never got beyond 40% on the simulation.

The complete make up of the semi-finalists has not yet been decided, nor has the team in 5th place. Pakistan, Bangladesh and Sri Lanka could all end up 5th. 

Next I looked at the winning probability. This is getting close to the point where it can be calculated analytically without much trouble.
 The next thing to look at is the rankings. A thing to remember here is that it is all relative to Afghanistan, so everybody going up is more an indication that Afghanistan has gone down.

The order that the teams are in here is the same as David Kendix' official rankings order, with one exception - I have India ahead of England, rather than the other way round.

Finally, a little graph to show what Pakistan needs to do to make the semi-finals. They need to keep Bangladesh below the green line.



Monday, 1 July 2019

World Cup simulation update - 1 July

Here's the latest update to the simulation. The first two graphs disagree slightly, and that's because I have two different methods to calculate the expected net run rate. The first one seemed to be slightly more accurate than the second, but there was not a big difference when I tested them. (The margin of victory in cricket matches is actually really difficult to estimate - teams batting second tend to cruise to victory rather than try to win by as big a margin as possible) I decided to use both when doing the calculations. With the first method, New Zealand and India both have a higher than 99.98% probability of going through, while it's 99% for India and 97.7% for New Zealand with the second method. These seem more realistic.


The big thing to notice is the change to England's probability, and how England beating India damaged the chances of both Pakistan and Bangladesh. Pakistan's probability went down by slightly more than Bangladesh's probability because the ranking of India dropped slightly, and Bangladesh need to beat India to get through.

This graph shows expected value - not the most likely value. Those are actually different things. The expected value is the mean of all the expected outcomes. As a result, none of the teams will actually end up with the points that this shows, but they should mostly get close to it.

 It's now looking like there's a roughly 45% chance that net run rate will be a deciding factor in who goes through to the semi-finals.

If Bangladesh beat India (which is admittedly a fairly unlikely outcome), we could then see a situation where Pakistan and Bangladesh are playing for the opportunity to be level on points with New Zealand and India on 11 points. If that is the case, then (in all likelihood) the rained out match between New Zealand and India will have allowed both to progress at the expense of the winner of Pakistan vs Bangladesh.

The most likely semi-finals at this point are Australia vs New Zealand and England vs India, but these are by no means confirmed yet.

In individual matches, England effectively has a higher ranking than that, because teams playing at home get a ranking boost of 0.86 over their opponent. That's why I have England back on top in the next graph:
This one is quite different to what the book-makers have. I have England as favourites, while they have India and Australia both tied for favourite on roughly 30%. They also have Pakistan and Bangladesh at about double the probability that I do.

I used the first net run rate model for the winning probability, but the difference in numbers suggests that the bookies are possibly using a model that is more similar to the second one.

Wednesday, 26 June 2019

World Cup simulation update - 26 June

Are the wheels falling off?

England have now got a 4 win, 3 loss record, and, with 2 difficult matches coming up, have a genuine chance of not going through to the semi-finals. They are still not relying on other results, but they're getting close to the point where they are.



There's been a significant change, with Australia going up, and England going down. England are now expected to get to 10 points. That might still be enough. But it also might not be.
England's ranking has now dropped well below India's, to the point where the expected probability of England winning against India has dropped by almost 10%. They're still ahead due to home advantage, but the difference is decreasing.
There's about a 15% chance that a tie-breaker (total wins or net run rate) will be required. This may count out Sri Lanka, who have had two rain affected matches, and so will probably be on fewer wins than anyone else with the same number of points.

We see a huge drop in the semi-final probability of England, and a resultant increase in Bangladesh, Pakistan and Sri Lanka. Australia have qualified now, and there are fewer options now for New Zealand to be knocked out also (only 35 out of 50000 trials saw New Zealand miss the semi-finals.)


The decrease in England, and increase in probability of lower ranked teams making the semi-finals has meant that there are a lot more semi-final combinations with more than a 0.5% chance of happening. West Indies vs New Zealand was an epic match in the pool play, and that's now a reasonable possibility for a semi-final. The ICC and Star Sports will be licking their lips at the prospect of the 8th most likely outcome - an India Pakistan semi-final would be absolute ratings gold.
This is the first time that England has dipped below India on the winning probability graph, but it's hard to win the final if you don't get out of the group stage.

Monday, 24 June 2019

World Cup Simulation update 24th June


 Here's the update after the South Africa vs Pakistan match

Firstly, this pushed Pakistan's ranking back above Bangladesh's ranking, although they are both so close that the match between them is now predicted as 50.2% to 49.8%.
 Looking at the expected points, Pakistan have now jumped ahead of Sri Lanka and Bangladesh.

It's looking fairly likely that 5th place will be on 9 or 10 points, while 4th will be on 10, 11 or 12 points.

My simulation only uses net run rate as the tie breaker. Accordingly, there's actually a slightly higher probability of Sri Lanka and Pakistan getting through than this shows, and a slightly lower chance of England and Bangladesh.

It's takes a lot of processor time to improve the simulation, and it's likely to be less than 1% difference, but I might have a go at improving it once we get to the last 5 matches.


England are still the overwhelming favourite to be the 4th team to go through. There were still 41 out of the 50000 trials where New Zealand hadn't made it. So nobody is guaranteed through just yet.


If you have semi-final tickets - this is who you're likely to see.

The probabilities for Bangladesh and Pakistan being so low here are understandable. They both have about a 5% chance of making the semi-final, but, given that they both have about a 1/3 chance of winning each match against the top teams, it gives them a roughly 0.5% chance of winning the tournament from here. However, if Bangladesh, Australia and Pakistan win the next 3 matches, that number will rise.

It's starting to look like England's style that is so effective in series may not be so effective in one off matches. It will be interesting to see if that trend continues.

Sunday, 23 June 2019

World Cup Simulation Update, 23 June

Here's the latest outputs from the simulation.

England's loss to Sri Lanka opened the door somewhat, but we can still be fairly confident in who the semi-finalists are.
 England's ranking has gone down, after two losses to fairly ordinary sides.
It's looking like 10 points will be the magic number. Roughly a 10% chance that we'll rely on a tie-breaker.

The average points expected certainly favour England on that count to be in fourth


Accordingly, they have a much higher chance of making it through.

What the likely match ups are. (Teams in alphabetical order, rather than placings)

England are still firm favourites by my model. Home advantage is massive.

Monday, 17 June 2019

World Cup simulation update

The group stage of the World Cup is now roughly half way through, and there are 4 clear favourites to be the semi-finalists.

Afghanistan is the first team to be eliminated (they may have a mathematical possibility, but they don't have a statistical one). At this point, Sri Lanka are not far behind.

The rankings of the teams have remained fairly consistent, suggesting that the extra weighting for world cup matches is about right.
The fact that almost all the teams seem to have gone up is due to them all being relative to Afghanistan. Afghanistan do not seem to be quite as good as they were seeming to be and so they have dropped, but as they are set to 0, it's pushed everyone else up slightly.

The semi-final probability is the most interesting. 

I personally feel that this is underestimating the chances of South Africa, but we will see as the tournament progresses.

The key point on this graph is match 5, where Bangladesh overcame South Africa. If South Africa had won that match, they would be on about 40% and New Zealand and Australia would both be a lot lower.

The simulation also puts out the points for 4th, 5th and the difference between them. This suggests at the moment that there's only a fairly low chance that net run rate will come into play. However, one more rained out match, or a Bangladesh upset of Australia, and this could change dramatically. This makes the expected lines to be 9 points for 5th place, and 11 points for 4th place.

So far of the teams that I've had as favourite to win, 14 out of the 17 have won. Given the probabilities that the models assigned them, that's slightly higher than I would have expected - I would have expected there to have been 4 upsets rather than 3, but it's still telling me that my model is working quite well. That may be due to teams not always playing their best combinations in every match between the world cup, adding extra uncertainty to the results than exist inside a world cup.

It will be interesting to see if it continues to have the same success rate after the cup is finished.

Finally, applying the same system to find the probable winner gets the following results:
England are still favourites, but India are not far behind them.

Thursday, 6 June 2019

World cup simulation update

Just before the World Cup started I wrote a post about a simulation that I had written to find the teams chances of making the semi-finals and chances of winning.

I've spent quite a bit of time improving it over the past week or so, learning some new machine learning techniques to improve my rankings etc.

Below are 3 graphs that show the change in rankings, semi-final probability and win likelihood.



These rankings are all relative to Afghanistan (who are first in the alphabet) Afghanistan will always be on 0. Every other team will change around them. any team ranked lower than them will get a negative rating.
The new model gave New Zealand and South Africa lower chances of making it, and Bangladesh and Australia higher chances of making it. South Africa have dropped lower still, while West Indies and Bangladesh have made up ground.

New Zealand has moved ahead of South Africa into the 4th most likely to win, but both teams are still at fairly long odds.

Thursday, 30 May 2019

A simulation to see who will win the World Cup


One of the main purposes of statistics is to help inform decisions. Cricket statistics are often used when deciding on selection of players, or (more often) arguments about who is the best at a particular aspect. They can help decide which strategies are best, what an equivalent score is in a reduced match (with a particular case of Duckworth Lewis Stern) or which teams should automatically qualify for the World Cup (David Kendix). They are often also used by bookmakers (both the reputable, legal variety and the more dubious underworld version) to set odds about who is going to win.

I decided to attempt to build a model to calculate the probability of each team winning, based on their previous form. This was going to allow me (hopefully) to predict the probabilities of each outcome of the world cup, by using a simulation. It didn’t prove to be as easy as I had hoped.

My first thought was to look at each team’s net run rate in each match, adjust for home advantage, and then average it out. That seemed sensible, and the first attempt at doing that looked like it would be perfect. Most teams (all except Zimbabwe) had roughly symmetrical net run rates, and they fitted a normal curve really well. The only problem was that Afghanistan was miles ahead of everyone else. The fact that they had mostly played lower quality opponents in the past 4 years meant that they had recorded a lot more convincing wins than anyone else.

This was clearly a problem. India and England both had negative net run rates, while Afghanistan, Bangladesh and West Indies were all expected to win most of their matches.

I then tried a different approach, based off David Kendix’s approach of using each result to adjust a ranking. But rather than having a ranking that was based off wins, I based it off net run rate. So if a team had an expected net run rate of 0.5, and another had an expected net run rate of 0.6, the first team would have an expected net run rate of -0.1 for their match. If they did better than that, they went up, and if they did worse than that, they went down.

However, I found that some results ended up having too much bearing. If I made it sensitive to a change in the results, it ended up changing way too much based off one big loss/win. England dropped almost a whole net run per over based on the series in the West Indies. So this was clearly not a good option.

Next, I decided to try using logistic regression, and seeing how that turned out. Logistic regression is a way of determining probabilities of events happening if there are only two outcomes. To do that, I removed every tie or match with no result, and set to work building the models.

My initial results were exciting. By just using the team, opposition and home/away status, I was able to predict the results of the previous three world cups quite accurately using the data from the preceding 4 years. (I could not go back further than that, as they included teams making their ODI debut, and there was accordingly no data to use to build the model.

The results were really pleasing. I graphed them here, grouped to the nearest 0.2 (ie the point at 0.6 represents all matches that the model gave between 0.5 and 0.7 as the chance for a team to win), compared to the actual result for that match. It seems that they slightly overstate the chance of an upset (possibly due to upsets being more common outside world cups, where players tend to be rested against smaller nations), but overall they were fairly reliable, and (most importantly) the team that the model predicted would win, generally won.

I could then use this to give a ranking of each team that directly related to their likelihood of winning against each other. The model gave everything in relation to Afghanistan, with the being 0, and any number higher than 0 being how much more likely a team was to win against the same opponent as Afghanistan. (Afghanistan was the reference simply because they were first in the alphabet).







This turns out to be fairly close to the ICC rankings. So that was encouraging.

I tried adding a number of things to the model (ground types, continents, interactions, weighting the more recent matches more highly) but the added complexity did not result in better predictions when I tested them, so I stuck to a fairly simple model, only really controlling for home advantage.
Next I applied the probabilities to every match and found the probabilities of each team making the semi-finals.


The next step was to then extend the simulation past the group stage, and find the winner.

After running through the simulation a few more times, I came out with this:


A couple of points to remember here: every simulation is an estimate. The model is almost certainly going to estimate the probabilities incorrectly, but it will get them close, and they will be close enough to give a good estimate of the actual final probabilities. It is also likely to overstate Bangladesh’s ability due to their incredible home record; overstate Pakistan’s ability as a lot of neutral matches for them they have had a degree of home advantage in UAE; and understate West Indies, due to them having not played their best players in a lot of matches in the past 4 years. But these are not likely to make a massive difference to the semi-finalist predictions.



Given this, I’d suggest that if you are wanting to bet on the winner of the world cup, these are the odds that I would consider fair for each team:


I will try to update these probabilities periodically throughout the world cup, and report on their accuracy.

Tuesday, 10 March 2015

Updated QF prediction chart

In my previous post I ran a simulation to find out potential quarter-final places. I received some criticism for having England so low, and Bangladesh so high, but events over the past 48 hours have shown that the respective probabilities of the two teams qualifying may not have been so far off.

The program that I wrote to do the simulation was corrupted when my computer crashed and I foolishly hadn't saved it, so I've written a different one to re-calculate. This time I made a couple of modifications. I moved from an additive model for run rates to a multiplicative one, as that seemed to be more sensible (teams are realistically a % better than other teams, rather than a fixed number of runs better. We would expect the margins to blow out more in terms of runs on better batting pitches than on difficult tracks).

I also slightly reduced the standard deviation of the simulation by moving it to one quarter of the mean rather than one third. This again made the results seem more sensible. There were too many teams scoring over 400 or under 100 previously.

Here are the new results. This table shows the probability of each team qualifying in position 1, 2, 3 or 4 in their group, and then the total probability of qualifying. Again I have not factored rain into this, and with Cyclone Pam heading towards New Zealand that may be a little optimistic.

Team1st2nd3rd4thQuarters
New Zealand10001
Australia00.9760.02401
Sri Lanka00.0240.97250.00351
Bangladesh000.00350.99651
------
India10001
South Africa00.9760.02401
Pakistan00.0170.6640.11650.7975
Ireland00.0070.3120.14050.4595
West Indies0000.7430.743

The potential group results look like this:

Group A
NZ Aus SL Ban0.9725
NZ SL Aus Ban0.024
NZ Aus Ban SL0.0035

Group B
Ind SA Pak WI0.5295
Ind SA Ire WI0.1985
Ind SA Pak Ire0.1345
Ind SA Ire Pak0.1135
Ind Pak SA WI0.011
Ind Pak SA Ire0.006
Ind Ire SA WI0.004
Ind Ire SA Pak0.003

The three interesting potential quarter final match-ups to watch for here are

SA vs Aus4.7%
Ind vs SL0.35%
Ire vs Ban0.02%

In reality the probabilities of Ireland vs Bangladesh and Australia vs South Africa are higher, as they are both much more likely if rain starts to fall.

Sunday, 8 March 2015

World-cup quarter finals simulation

After Pakistan's tremendous win over South Africa, and Ireland's remarkable victory over Zimbabwe, the make up of the quarter finals is not really much clearer.

They question as to who is likely to be going through, and who will play whom has been the subject of many, many twitter conversations.

I thought it might be helpful to run a simulation to look at some of the possibilities.

I used Microsoft Excel as it's quite convenient. I used the scores already made in this tournament to decide the probable scores. For each team I got their average rpo scored in relation to the overall group run rate, and their average conceded in relation to the overall. Hence if a team in group A averaged scoring 5.5 rpo and conceded 5.3 rpo, they got values of +0.4 for batting and +0.2 for bowling (as the average rpo in group A has been 5.1 so far). From that point I then used an inverse normal, with a random number between 0 and 1 for the area, the group run rate plus the batting run rate modifier and the other team's bowling run rate modifier as the mean. For the standard deviation, I used the smallest of one third of the mean and 1.6. This allowed me to make sure there was (almost) no chance of a team getting a negative score, but that the scores weren't going to blow out too much.  I used 1.6 as that's the standard deviation of all innings run rates this tournament..  This gave me a 50 over score for each team, and so which ever was ahead got the points for the win.

There are a few limitations with this method. I didn't take into account the quality of the teams that each side had faced. England has played Australia, New Zealand and Sri Lanka, but has yet to play Bangladesh or Afghanistan. Their numbers are not going to necessarily show how well they will do against less fancied opponents. Likewise no adjustments were made for the pitch that the match is being played on. We know that South Africa have tended to favour playing on bouncier tracks, so an innings at the 'Gaba won't necessarily tell us much about how they would go in Dunedin. I also haven't taken into account player strengths. Bangladesh's batsmen tend to struggle against tall bowlers, such as Finn and Woakes. England can expect that those two bowlers will perform better than average against Bangladesh, and hence their team is likely to do better than the numbers would suggest.

Another major limitation is that I haven't made provision for rain. That would obviously throw off all calculations. However, given the limited information I felt that a more simple model was best.

I decided to do 2000 trials, so that I could feel that the major source of uncertainly was the assumptions rather than the natural sampling variability.

First I found the probability of the different teams making the quarter finals with my simulation:

TeamProbabiity
New Zealand100%
Australia100%
Sri Lanka99.95%
Bangladesh82.51%
England17.54%
--
India100%
South Africa100%
Pakistan74.71%
Ireland61.82%
West Indies63.47%

We can see that Pool A has one crucial match (England vs Bangladesh)
Pool B, however, is still wide open. Ireland vs Pakistan is the last game of the round robin, and it's shaping up to potentially be one that has 3 team's fortunes riding on the result.

If West Indies make the final 8, they will almost definitely face New Zealand. It's very unlikely that New Zealand will not end up on top of Pool A, and impossible that West Indies will end up 3rd or higher in pool B.

Here's the full results for all possible matchups
Pool APool BProbability
New ZealandPakistan14.99%
New ZealandSouth Africa0.35%
New ZealandIreland21.23%
New ZealandWest Indies63.44%
AustraliaIndia2.30%
AustraliaPakistan43.11%
AustraliaSouth Africa27.57%
AustraliaIreland27.02%
Sri LankaIndia18.18%
Sri LankaPakistan15.83%
Sri LankaSouth Africa53.75%
Sri LankaIreland12.19%
BangladeshIndia64.74%
BangladeshPakistan0.75%
BangladeshSouth Africa15.88%
BangladeshIreland1.15%
EnglandIndia14.79%
EnglandPakistan0.05%
EnglandSouth Africa2.45%
EnglandIreland0.25%

I'll redo this after tomorrow's results, and then again on Monday.

The most likely scenario at the moment is India to play Bangladesh, Australia to play Pakistan, South Africa to play Sri Lanka and New Zealand to play West Indies.

I've updated this here