London Marathon Pacing Analysis

Background

What This Is

390,000 finishers. 11 race years. One key question: does it matter how you pace the first half — not just how fast?

The short answer is yes — and the effect is larger than most runners expect. Runners who start 10% faster than their ability-group peers lose an average of 11 extra minutes in the second half, compared to those who start evenly. That finding holds across every finish band from sub-3:00 to 6:00+.

Built with Python and pandas for data engineering (scraping, cleaning, feature engineering), statistical regression in NumPy, and Chart.js for interactive visualisation. The key methodological choice: normalising each runner's first-half pace relative to their finish-time peers rather than using raw pace — this within-group approach is what unlocks the r = −0.59 signal and makes the predictor meaningful across all ability levels.

Key Findings

What the Data Shows

Six robust findings from 390,000 finishers (n = 389,122 with complete split data) across 11 race years.

~95%

Run a positive split

The vast majority slow in the second half. Negative splits are genuinely rare — achieved by fewer than 6% of finishers.

~1.13×

Median split ratio

The typical runner's second half is ~13% slower than their first. The fastest finishers are near 1.03×.

30–35K

Where runners slow most

The biggest pace drop across the field occurs in the 30–35km segment — the classic marathon wall, quantified.

—

Split ratio gap: fast vs slow

The pacing gap between the fastest and slowest finish bands is large and consistent across all 11 years.

1.12 vs 1.14

Women pace more evenly

Women's median split ratio is lower than men's within every finish time band — a consistent gap across the full dataset.

11 min

Cost of starting 10% too fast

Going out 10% faster than your ability group costs ~11 extra minutes in the second half. Raw pace is a weak predictor (r = 0.21) — relative pace is not (r = −0.59).

Race Strategy Tool

Race Strategy Calculator

Based on the regression model (r = −0.59), this shows what the data observed for runners who started at different paces relative to their ability group. Set your target time, adjust the slider, and see the predicted outcome.

Important: This is descriptive, not prescriptive. The model reflects what runners did, not what you should do. A faster predicted finish at a higher first-half pace does not mean going out hard is a good strategy — those runners were often faster to begin with. Use this to understand the trade-off, not as a race plan.

Target finish time

First-half pace vs your ability group: Even pace (0%)

20% faster Even 20% slower

Based on linear regression across 390,000 finishers (r = −0.59, slope = −110 pp per unit relative pace). Predictions are population averages — individual results vary with fitness, conditions, and nutrition. "Ability group" means runners who finished in the same time band as your target.

Key Charts

The Three Core Findings

These three charts capture the main story. For the full breakdown across all 10 charts — gender, age, yearly trends, segment pacing, and pacing consistency — see the detailed findings page.

Finding 1

Almost No One Runs an Even Split

The distribution of split ratios (second-half ÷ first-half) peaks just above 1.0 with a long rightward tail. The median is ~1.13 — the typical runner's second half takes 13% longer than their first. Negative splits are achieved by fewer than 6% of finishers.

Takeaway: Even pacing is not the norm. Starting conservatively and holding pace is rare, not routine — even among experienced runners.

Hover any bar for exact count. Blue = negative split, red = positive split. Toggle Men / Women above to see how the distribution shifts.

Finding 2

Faster Finishers Pace More Evenly — By a Wide Margin

Sub-3:00 runners show split ratios near 1.02–1.03. Runners finishing in 5:00–6:00 hours regularly exceed 1.18–1.20. The gap is large and consistent across all 11 years of data.

Caution: This is an association. Faster runners have greater aerobic capacity — the data cannot say whether even pacing causes faster finishing or simply reflects underlying fitness.

Hover any bar to see median, IQR, and runner count. Toggle Men / Women above to compare pacing by gender.

Finding 3 · Strongest Predictor

Going Out Too Fast — Relative to Your Peers — Predicts a Worse Second Half

Raw first-half pace is a weak predictor (r = 0.21). But controlling for ability — expressing each runner's pace as a ratio to their finish-band median — the correlation strengthens dramatically to r = −0.58.

Going out 10% faster than your peer group is associated with ~11 extra percentage points of second-half slowdown. This is the project's strongest quantitative finding. This is a correlation — the data cannot prove that a slower start would have produced a better finish for those specific runners.

Hover bars to highlight the corresponding zone on the regression chart above.

View all 10 charts →

Methodology

How This Was Done

Data sources, processing decisions, and assumptions. Understanding the methodology is essential for interpreting the findings correctly.

Data Source

Results scraped from the official TCS London Marathon results platform (results.tcslondonmarathon.com), powered by mika:timing GmbH. Split times collected from individual runner detail pages.

What Is Included

Mass event finishers with a recorded finish time. Elite runners are excluded from mass charts. Virtual runners (different courses) are excluded entirely.

Pacing Metrics

Split ratio = second-half time ÷ first-half time. Values above 1.0 indicate a positive split. Half time is cumulative elapsed time at the halfway mat (~21.1km).

Statistical Choices

Median used throughout (not mean) because finish times are right-skewed. Regression used only where it adds meaning. No clustering applied without data-driven justification.

Known Limitations

— Half-marathon split times are missing for a small fraction of runners (<1%). Retained for finish-time analysis but excluded from split-ratio calculations. Missing data appears random.

— Year-on-year comparisons are confounded by field composition: weather, entry standards, charity places, and course conditions differ each year. Trends should be interpreted cautiously.

— The 2020 race was cancelled (COVID-19). A small virtual-only event ran in its place; those runners are excluded as they ran different courses under different conditions.

— Runner ID collection for 2019, 2021, and 2022 is estimated at ~75% of the full field due to a hard pagination cap (~21k results per query) on the results platform's search endpoint. This is a platform constraint, not a data processing issue.

Future Work

What Comes Next

The analysis is in good shape. These are the remaining open questions worth pursuing.

Data
Complete 5K split coverage for 2019, 2021, and 2022 Split times for these three years are based on ~75% of the field due to a platform pagination cap. Closing this gap would make the segment-pace and consistency charts fully representative across all 11 years.
Phase 2
Weather and course condition controls Year-on-year comparisons are currently confounded by conditions that differ each race. Incorporating weather data (temperature, wind) would allow proper normalisation and make trend analysis more meaningful.
Model
Natural grouping investigation The pacing predictor assumes a linear relationship between relative starting pace and second-half slowdown. Testing for natural subgroups (e.g. via BIC on a Gaussian mixture model) could reveal whether distinct pacing archetypes exist in the data.
Phase 2
Repeat-runner tracking The dataset contains runners who competed in multiple years. Tracking the same runner across editions would allow genuine individual improvement analysis — rather than comparing different field compositions year on year.

Curiosities

Fun with the Data

A few lighter observations from the dataset — patterns that don't belong in the main analysis but are too interesting to leave out.

1.07×

The most common fade

More runners finish with a 1.07 split ratio than any other — 14,740 of them. If you had to pick one number that describes how London usually goes wrong, this is it: a second half exactly 7% slower than the first.

40–44

The best-paced age group

Runners in their early forties pace more evenly (1.117×) than those under 40 (1.132×). Every age group above 54 fades progressively more — but the 40–44 bracket quietly outperforms the youngest runners by a measurable margin.

+23 min

What a hot day costs

The 2018 race ran in 23°C+ heat. The median finish time was 23 minutes above the 10-year average, and 98.6% of runners posted a positive split — the highest rate in the dataset by a wide margin.

Where men and women diverge most

The gender pacing gap is small among fast finishers and peaks in the 4:30–5:00 band, where men fade 8.6 percentage points more than women on average. Hover to see the gap across every band.

Starting 10% Too Fast
Costs the Typical Runner
11 Minutes. Proven.

What This Is

What the Data Shows

What Does This Mean for Your Finish Time?

Race Strategy Calculator

The Three Core Findings

Almost No One Runs an Even Split

Faster Finishers Pace More Evenly — By a Wide Margin

Going Out Too Fast — Relative to Your Peers — Predicts a Worse Second Half

How This Was Done

Data Source

What Is Included

Pacing Metrics

Statistical Choices

What Comes Next

Fun with the Data