Model Methodology · FiveStat

How our Models were built.

How our models work, what they get right, and exactly where we know they'll go wrong. No disclaimers - just the specifics.

2024/25 Backtest Results

Our walk-forward accuracy metrics across the full 2024/25 Premier League season - 379 matches, no future data used.

Outcome accuracy
51.7%
vs 40.9% home-always baseline
Moneyline accuracy
68.2%
286 decisive matches
Avg RPS
0.205
vs 0.2369 naive · +0.0319 gained
O/U 2.5 accuracy
56.5%
379 matches · 2024/25
Correct score acc
9.8%
Most probable scoreline
xG MAE
0.681
Goals vs predicted xG per team

2024/25 backtest results

The Methodology

The above metrics are from a 'walk-forward' backtest ran for the full 2024/25 Premier League season (379 available matches). For each gameweek, the model was trained on data available up to that point, which included season data from 2016 - present plus the completed 2024/25 gameweeks. No future data is used in any prediction, mirroring how the model operates in production.

The Outcome accuracy - 51.7%

The model correctly predicted the most likely match outcome (home win, draw, or away win) in 51.7% of fixtures. By outcome: home win 71.6%, away win 64.1%. Key insight: A Draw prediction as the most likely outcome is near-zero for any probabilistic model as draws take up the least no. outcomes in the scoreline matrix (64 outcomes, 8 of those result in a draw).

The Ranked Probability Score - 0.205

RPS measures probabilistic accuracy across all three outcomes, rewarding confident & correct predictions and penalising confident but incorrect ones. Key: Lower is better. An equal-weight model scores 0.2369. Our model's improvement of +0.0319 over this baseline showcases a meaningful probability calibration beyond a simple outcome prediction.

The Moneyline accuracy - 68.2%

Excluding draws and measuring only decisive fixtures, the model correctly identified the winning team in 68.2% of the 286 matches that produced a winner. This is the most relevant metric for win market betting.

the Correct score rate - 9.8%

These are the highest-probability scoreline in each heatmap, which matched the actual result in 9.8% of fixtures. The industry benchmarks for a correct score models on EPL fixtures is around 5-8%, making this a strong result given the model uses only historical goals and xG data.

Match prediction model

01
Historical data

Premier League match results from 2016/17 to present are loaded. Goals scored and goals conceded are extracted per match and assigned home or away, forming each team's base ratings.

Note: Newly promoted teams are excluded from GW1 predictions. The model requires a minimum number of Premier League matches in the historical dataset to generate reliable ratings - promoted sides have no top-flight data and are added once they have played enough matches to produce stable estimates.

02
Base attack and defence ratings

ATT and DEF ratings are computed per Team. ATT is the average goals scored both home & away; DEF is the average conceded. These ratings represent the team's underlying quality.

ATT = (avg_home_goals_for + avg_away_goals_for) / 2
DEF = (avg_home_goals_against + avg_away_goals_against) / 2
03
Recent form adjustment

Form ratings are computed from the most recent 20 matches and blended with the base ratings. The blending parameter α (0.30) controls how much weight is placed on recent form versus long-run quality. At 0.30, the model places 70% weight on the team's long-run historical average and 30% on recent form - keeping predictions stable and avoiding overreaction to short-term runs.

blended_att = (1 − α) × base_att + α × recent_att
blended_def = (1 − α) × base_def + α × recent_def
04
Computing Expected goals (xG)

Team xG is derived using a multiplicative Dixon-Coles model - ATT rating × opponent DEF rating. Both ratings are MLE-fitted and normalised to a league mean of 1.0, so the product directly gives expected goals in the correct scale without any further calibration step. The xG is then adjusted for team-specific home field advantage (derived from the last 20 home vs away matches, capped at ±15%) and any manual adjustments for material squad changes mid-season.

xG = blended_att × opponent_blended_def × home_advantage_multiplier
05
Bivariate Poisson simulation

Scorelines are simulated using a bivariate Poisson distribution rather than two independent Poisson processes. Parameter λ₃ (0.05) introduces positive correlation between home and away goals, producing more realistic draw probabilities. Independent Poisson models can underestimate draws as they treat the outcomes as independent - we need to account for the tactics and dynamics of football (losing teams will push forward, level or ahead teams can sit deeper) which increases the likelihood of draws like 1-1 and 2-2.

P(X=i, Y=j) = Σₖ [ Poisson(i−k; λ₁) × Poisson(j−k; λ₂) × Poisson(k; λ₃) ]

where λ₁ = home_xg − λ₃, λ₂ = away_xg − λ₃, and the sum is over k = 0..min(i,j). This yields a 8×8 scoreline probability matrix which is normalised to sum to 1.

06
Outcome probability

Match outcome probabilities are derived from the scoreline matrix:

P(home win) = Σᵢ₊ⱼ P(i,j) where i (home) > j (away)
P(draw) = Σᵢ₊ⱼ P(i,j) where i = j
P(away win) = Σᵢ₊ⱼ P(i,j) where j > i

Clean sheet probabilities are derived as P(away goals = 0) for home clean sheet and P(home goals = 0) for away clean sheet. Over 2.5 probability is Σ P(i,j) where i+j > 2.

League table simulation

01
Monte Carlo simulation

The remaining unplayed fixtures are simulated by generating a random outcome from the model's home win / draw / away win probability distribution. Points are awarded per team off the back of these simulated results and stored.

02
10,000 runs

The full remaining season is simulated 10,000 times. Each run produces a final league table. The number of times a team finishes in each position gives the team's final league position probability. Expected final points (xPTS) are calculated as current points plus the average simulated points from the entire sim.

03
Limitations

The simulation uses static match probabilities - it does not re-estimate team ratings between simulated fixtures. This means a team on a strong run whose ratings haven't yet absorbed that form will be slightly undervalued, and vice versa. Results should be read as a snapshot of the current probability distribution, not a forecast that updates dynamically as simulated rounds play out. In practice, this tends to compress the variance of final points distributions slightly toward the mean.

FPL Planner

01
Player Picks - projected xG & xA

Attacking players are ranked by their projected xG for the upcoming gameweek. A player's projected xG is their adjusted xG share (season xG share blended with recent-form performance) multiplied by the team xG the model has assigned to each fixture. Projected xA follows the same approach but uses a player's share of team shot-creating actions and expected assists from the shots data. Both metrics can be viewed across the next 1, 3, or 5 gameweeks, with the multi-GW values representing the sum across all fixtures in that window.

player_xG = adjusted_xg_share × team_xg_vs_opponent
player_xA = adjusted_xa_share × team_xg_vs_opponent
02
Team xG - Attacking Fixture Strength

Each team's expected goals output is taken directly from the match prediction model for their upcoming fixtures. Teams are ranked from highest to lowest xG, giving an at-a-glance view of which teams have the best attacking opportunity in the selected gameweek window. Green bars indicate a high expected output, yellow shows moderate, and grey shows poor.

03
Team xGA - Defensive Fixture Strength

xGA (expected goals against) is the xG the model assigns to a team's opponent, showing how many goals a team is expected to concede in a given fixture. It is derived by taking the opponent's model xG from the same match prediction. Teams are ranked from lowest to highest xGA, so the teams at the top of the chart face the easiest defensive fixtures. This is the defensive mirror of the Team xG chart and is most useful for identifying FPL defenders and goalkeepers with clean sheet potential.

team_xGA = opponent_xG (from match prediction model)
04
Clean Sheet Probability

Per-match clean sheet probabilities are taken directly from the scoreline matrix produced by the match prediction model, specifically P(opponent goals = 0). For multi-gameweek windows (Next 3, Next 5), these per-match probabilities are summed to give us expected clean sheets (xCS) in that window.

xCS (window) = Σ P(clean sheet) across fixtures in window
05
Fixture Difficulty Rating (FDR)

Each upcoming fixture is colour-coded by difficulty based on the model's win probability for the team in question. Easy (green) reflects a high win probability, Medium (yellow) a relatively competitive fixture, Hard (red) a low win probability, and Blank (grey) is a blank gameweek. Thresholds are set at win probability above 55% for Easy, 35-55% for Medium, and below 35% for Hard.

Bet Value Finder

01
Bookie implied probability

Decimal odds from bookmakers are converted to implied probabilities and then margin-adjusted to remove the overround, giving a fair representation of what the bookmaker believes the true probability to be. Odds are sourced from CheckTheChance and refreshed regularly. Important to note that each sportsbook will update and change prices as fixtures approach, so do not take these bookmaker prices as accurate as they will change.

implied_prob = (1 / decimal_odds) / Σ (1 / all_market_odds)
02
Expected Value (EV) calculation

EV expresses the percentage edge our model's probability has over the bookie's margin-adjusted implied probability. A positive EV means our model believes the true probability is higher than the bookie is pricing, suggesting the market may be undervaluing that outcome. A negative EV means the bookie's price implies a higher probability than our model estimates.

EV (%) = (model_prob − bookie_implied_prob) × 100
03
Limitations

EV is a model-driven signal and does not account for line movement or all bookmaker prices. It should be interpreted as an indicator of where our model disagrees with the market, not as a guarantee of profit. Critically, a positive EV bet losing - or even losing several times in a row - does not mean the signal was wrong. Short-run variance dominates any small sample of bets, and a genuine edge only becomes statistically meaningful across hundreds of outcomes. If you are using this tool for any real-money purpose, that distinction matters. Gamble Responsibly!

F1 race prediction model

01
Driver Pace Rating (DPR)

DPR measures a driver's true pace independent of the car they are in, using only intra-team comparisons. For each completed race with valid lap data, the model filters to clean representative laps (removing lap 1, pit entry/exit laps, and outliers via IQR per driver), takes each driver's median clean lap time, and computes the percentage delta versus their teammate. Because both drivers share the same car and the same circuit conditions on the same day, this removes the majority of equipment bias from the comparison.

per_race_delta = (driver_median − teammate_median) / teammate_median × 100

Negative = faster than teammate (better). These per-race deltas are then aggregated across the current season plus the two prior seasons, weighted by recency decay (0.85 per race back in time), to produce a single DPR value per driver. Reliability bands are applied: 8+ races = high confidence, 4–7 = caution, fewer than 4 = low.

02
Constructor Pace Index (CPI)

CPI measures each constructor's outright pace relative to the full field. For each race, the field median lap time is computed from all clean laps. Each constructor's two drivers' medians are averaged to produce a constructor median. The delta versus the field median is then recency-weighted using the same 0.85 decay as DPR. A negative CPI means the team is running faster than the field average - lower is better.

CPI_delta = (constructor_median − field_median) / field_median × 100

CPI captures overall car performance across a season and is the primary input representing team competitiveness in the race prediction model.

03
Blending pace with grid position

For each driver, a combined pace score is derived by blending DPR and CPI. This score is then blended with the driver's qualifying grid position using a circuit-specific overtaking index. The overtaking index runs from 0.0 (grid position almost entirely determines result, e.g. Monaco at 0.10) to 1.0 (pure pace dominates, e.g. Monza at 0.80). This reflects the real-world constraint that a fast driver starting 15th at Monaco has far less opportunity to recover position than the same driver at Spa.

race_score = (1 − OI) × grid_score + OI × pace_score

When qualifying data is not yet available (pre-qualifying mode), grid positions are estimated from the pace ranking directly.

04
Monte Carlo simulation

The race is simulated 5,000 times. In each run, each driver draws a pace sample from a normal distribution centred on their race score with a standard deviation of 0.18, representing the inherent randomness of race conditions, safety car timing, tactical strategy, and general chaos. An 8% DNF probability is applied per driver per simulation, reflecting the historical F1 mechanical and incident failure rate. Win, podium (top 3), and points finish (top 10) probabilities are calculated as the proportion of simulations in which each driver achieves each outcome.

P(win) = simulations where driver finishes 1st / N_simulations
05
Limitations

The model has no visibility of race strategy, weather forecasts, tyre allocation, or team orders. DPR reliability is lower for drivers new to a team or in their first season, since the teammate comparison requires both drivers to have raced in comparable conditions. Safety cars, VSC periods, and red flags introduce variance the model cannot price in pre-race. Treat the output as a pace-and-grid-based prior that will be overridden by race-day information the model cannot see.

F1 race analysis

01
Stint Efficiency Score

Stint efficiency measures how much faster or slower each driver ran relative to what the tyre compound would predict at that age. For each compound used in the race, a linear degradation model is fitted across all drivers' clean laps on that compound: lap time as a function of tyre age. Each driver's actual lap times are then compared to the model's predicted times lap-by-lap. The mean residual (predicted minus actual) gives the efficiency score for that stint - positive means the driver is outperforming the tyre model, negative means they are underperforming it.

efficiency = mean(predicted_lap_time − actual_lap_time) per stint

This separates tyre management skill from raw pace: a driver who consistently extracts more from an ageing tyre than the compound average will show a positive score regardless of their outright lap time.

02
Teammate pace delta

The per-race teammate comparison used to compute DPR is also surfaced directly on the race report page. This gives the raw head-to-head median clean lap delta for that specific race, independent of the multi-race weighted average. It provides a single-race lens on intra-team performance that the aggregate DPR smooths over.

race_delta = (driver_median − teammate_median) / teammate_median × 100

F1 Fantasy Planner

01
Expected Fantasy Points (xFP)

xFP projects a driver's expected fantasy points for an upcoming race using the race prediction model. Win, podium, and points-finish probabilities from the simulation are multiplied by the standard F1 fantasy scoring weights for each outcome and summed to produce a single expected points value per driver. xFP is not a guarantee of score - it represents the probability-weighted average outcome across all simulated races.

xFP = Σ (outcome_probability × fantasy_points_for_outcome)
02
Season projection

Season xFP sums actual fantasy points earned in completed races with projected xFP across all remaining races. This gives a full-season expected points total per driver, combining what has already happened with what the model expects to happen. Transfer targets over the next 1, 3, or 5 races are derived from xFP over that horizon, providing a window-adjusted view of who offers the best projected return in each planning period.

03
Limitations

xFP inherits all of the race prediction model's limitations - it does not know about weather, strategy, or team orders. Fantasy scoring can also be affected by fastest lap bonuses and qualifying positions which the model only partially captures. Treat xFP as a starting point for decision-making, not a definitive ranking.

Where the model struggles

Probability is not a promise

A prediction of 65% does not mean the favoured outcome will happen. It means that across a large number of similar matches, the favoured side wins roughly 65% of the time - and loses 35%. A single match resolving against the model's favourite is not evidence the model is wrong any more than a coin landing tails is evidence the coin is biased. The correct way to evaluate probabilistic predictions is over many outcomes, not individual ones. The xTable makes this explicit: teams sitting below their expected position are not necessarily being unlucky this week - they may simply be in the 35% of simulations where their opponents performed above expectation. Any match, betting decision, or fantasy pick that uses a model probability should be understood in those terms.

Newly promoted sides

Teams returning to the Premier League have limited historical data to generate ATT and DEF ratings. Until they have played enough matches to generate a reliable estimate - typically 6 to 8 gameweeks - their ratings are generated using average league values, which almost certainly misrepresents their true quality, but we'll flag this under the relevant fixtures early in the season. Predictions for these fixtures should be interpreted with wider uncertainty than the output probability implies. This is a current and deliberate trade-off: initialising from lower-league data introduces more noise.

Mid-season manager changes

The model currently has no feature to detect or respond to managerial appointments. Team ratings are driven entirely by results, so a tactical reset following a new appointment won't register until it produces a sustained change in outcomes (if any). This is the single scenario most likely to produce a model miss on a team whose underlying quality has improved faster than the data can reflect.

Fixture Importance

The model has no concept of what a result will mean to either side. A team with nothing to play for after GW36 may press less intensely than their rating will suggest, but the model still assigns them the same probability it would in a match where points still matter to them. This cuts in several directions. Teams chasing a title or a top-four finish routinely field full-strength sides in every Premier League fixture, often regardless of European commitments, and the model handles these reasonably well. The harder cases are: sides with confirmed European football already secured who rotate ahead of a final, teams mathematically safe from relegation with six games left who visibly ease off, and mid-table sides with no upward or downward pressure who meet a team scrapping for a Champions League place - the model will likely underestimate the motivated side and overestimate the comfortable one. Relegation run-ins are particularly prone to this: a side one point above the drop will frequently outperform their season-average ratings in must-win fixtures, producing upsets the model systematically underweights. European rotation compounds the issue for elite clubs - a team playing Thursday in Europe and Sunday in the Premier League will field a different eleven to the one their ATT and DEF ratings reflect, and the model cannot see that coming.

Draws

Draws are the hardest outcome to predict for any probabilistic model of football, and this one is no exception. The bivariate Poisson with Dixon-Coles correction improves draw calibration against a naive independent Poisson, but the model's average draw probability (~26%) still slightly underestimates the EPL's actual draw rate. The structural reason is that draws emerge from match dynamics - tactical adjustments, defensive shape after going ahead - that no pre-match model can capture. Treat draw probabilities as directionally useful rather than precisely calibrated.

Early season (GW1–GW6)

The form component of team ratings is computed from the last 20 matches. At the start of a new season, the current-season sample is small and the model leans heavily on historical data that may now be stale - squads change significantly in the summer transfer window. Predictions in the opening six gameweeks carry meaningfully more variance than the rest of the season, and the backtest metrics above should not be assumed to hold equally across all gameweeks. The walk-forward methodology mitigates this, but cannot eliminate it.

What this means for the backtest numbers

The performance metrics above are real and have not been cherry-picked - they reflect the full 2024/25 season under walk-forward conditions with no future data. But they are averages. The model performs best on mid-table fixtures between established sides with stable squads and no European distraction, and worst on promoted sides, rotation-heavy periods in November–March, and matches immediately following a manager change. If you are using the model's output for any purpose that requires you to understand its distributional accuracy rather than its average accuracy, those caveats matter.