A single World Cup is one noisy sample, so its winner reflects luck as much as the strongest squad. This forecast simulates the tournament 100,000 times, anchored to the betting market and calibrated on 25 years of international results, to estimate each team's chance of lifting the trophy. The current favorite is 🏆 Spain.
The slider reweights the forecast between the independent Elo model and the betting market. Each stop is a precomputed run, not an interpolation; teams on which the two sources disagree (Argentina, for instance) shift the most.
The default is 85% market / 15% model. The 15% is a judgment rather than a fitted value. Combining an independent model with the market tends to help at the margin, so the weight should be positive, but the market carries more information than the ratings alone, so it should stay small. The exact figure cannot be tuned without historical market prices, which do not exist for past tournaments; a deliberately conservative value is used, and the slider exposes the sensitivity directly.
| # | Team | Champion | Final | Semi | Quarter | R16 | Market |
|---|---|---|---|---|---|---|---|
| 1 | Spain | 19.3% | 29.5% | 43.6% | 56.3% | 75.4% | 15.6% |
| 2 | France | 15.5% | 24.4% | 38.2% | 53.2% | 73.3% | 15.4% |
| 3 | Argentina | 10.5% | 19.1% | 31.8% | 47.9% | 64.3% | 8.3% |
| 4 | England | 9.3% | 17.0% | 28.4% | 45.0% | 66.4% | 9.6% |
| 5 | Portugal | 9.2% | 16.9% | 28.8% | 45.4% | 66.7% | 9.9% |
| 6 | Brazil | 7.0% | 13.2% | 23.6% | 39.3% | 60.5% | 7.5% |
| 7 | Germany | 5.0% | 10.3% | 19.8% | 33.2% | 60.7% | 5.6% |
| 8 | Netherlands | 3.9% | 8.6% | 17.3% | 33.1% | 51.9% | 4.4% |
"Market" is the de-vigged bookmaker title odds. Other columns are the simulated probability of reaching each round at the current slider setting. The red circle marks the model's most likely champion. At blends below 100% the model can read above or below the Market column. That's its independent (Elo) view disagreeing with the bookmakers.
Highlighted = advance. The 8 best third-placed teams also qualify.
One concrete bracket where the higher-rated side wins every game. The % next to each team is its chance of reaching that round (across all simulations).
Because the tournament is played only once, a single prediction is dominated by chance. The tractable question is distributional: if this exact draw were replayed thousands of times, how often would each team win?
That is a Monte Carlo simulation. Given a realistic model of one match, the full 103-game bracket (the third-place playoff is omitted) is played out and the exercise repeated 100,000 times; the frequency of each outcome is its probability. A team that wins 16,400 of 100,000 runs has a 16.4% title chance.
Each team carries an Elo rating, the system originally devised for chess. A team gains points for a win and more for beating a strong opponent, and the gap between two ratings maps to a win probability, scaled so a 400-point edge is roughly 10-to-1:
Equal ratings give a coin flip; a 200-point edge implies about 76%. Host nations receive a modest home-field adjustment.
The simulation produces a scoreline, not just a winner, since the group stage is decided on goal difference. Each side's win-expectancy is converted into an expected goal count, and the score is drawn from a Poisson distribution, the standard model for counts of rare, roughly independent events:
Evenly matched sides average 1.178 goals each, and the favorite's mean rises accordingly. A Dixon-Coles correction adjusts low-scoring lines so the simulated draw rate matches the empirical one.
Spain (rating 2092) vs Croatia (rating 1931), neutral venue. The engine proceeds in four steps.
Step A, the rating gap
Step B, turn the gap into a win expectancy
Step C, turn that into expected goals
Step D, roll the dice (Poisson + draw correction)
The most likely scoreline is Spain 2–0 Croatia, yet Croatia avoids defeat 39% of the time, the per-match randomness from Step 4. A full tournament is this same calculation across 103 matches, repeated many times.
The two constants above (1.178 and 1.98) govern how strongly a rating edge determines a match, and therefore how much is left to chance. Set too high, the model is over-confident; too low, under-confident. Rather than choose them by hand, I fit them to 24,787 international matches since 2000 by maximum likelihood, selecting the values under which the observed results are most probable. What the data imply about football's inherent randomness:
| Matchup (by rating) | Win | Draw | Lose |
|---|---|---|---|
| Even (50/50) | 34% | 32% | 34% |
| Slight edge (60%) | 46% | 30% | 24% |
| Clear favorite (70%) | 59% | 26% | 15% |
| Big favorite (80%) | 70% | 21% | 9% |
| Huge favorite (90%) | 81% | 15% | 5% |
The betting market is the best-calibrated single forecast available; it aggregates many participants with money at stake and updates quickly to news. Two markets are selectable above the slider, a sportsbook futures board (ESPN) and the Kalshi exchange. Their implied prices sum to more than 100%; that margin (the vig, about 18% for the sportsbook but only 5% on the exchange) is removed first. The model is then calibrated by adjusting each team's rating until its simulated title odds match the chosen target.
The slider sets the blend. At 0% the forecast is the pure Elo model, free to diverge from the market; at 100% it is the market alone. The default of 85% market / 15% model leans on the market, historically the stronger single forecast, while retaining the model as a hedge.
Reproduce: python3 run.py,
python3 fit_variance.py, python3 backtest.py.