A price is a probability

If a market says 73%, what does that even mean — and how would you ever know if it's right? Here's how the pros grade a forecast: calibration, and the Brier score.

By now the core idea of this series is familiar: a market price is a probability. A contract trading at 73¢ is the crowd saying 73%.¹ But a probability is a slippery kind of claim. If I tell you there's a 73% chance of rain and the sky stays dry, was I wrong? Not really — unlikely things are supposed to happen 27% of the time. You cannot grade a single probabilistic call on a single outcome.

So how do you tell a real forecaster from a lucky one — or a calibrated expert from a confident fraud? You stop staring at any one call and look at all of them. Two tools do the work: calibration and the Brier score. They're how you hold a number accountable.

A single prediction can't be graded. A thousand of them can.

The only way to grade a probability

A 73% forecast is not a promise that the thing will happen. It's a claim about frequency: of all the times I say 73%, the event should happen about 73 times in 100. That reframing is the whole trick. One call tells you nothing; a track record tells you everything. So gather every market that closed near 73¢ and count how many resolved YES. Then do it for every price level. What you get is a calibration curve.

The calibration curve

Plot what the market said on one axis and what actually happened on the other. A perfect forecaster lands on the 45° line: when it says 30%, the event happens 30% of the time; when it says 90%, 90%. Real markets hug that line remarkably well — far better than pundits, who are reliably overconfident and pay nothing for it.²

A reliability diagram — illustrative. Closer to the diagonal means better calibrated.

Miscalibration is just distance from the line. A forecaster whose "90%" calls only happen 70% of the time is overconfident — their curve sags below the diagonal at the high end. The picture tells you not only that someone is off, but exactly how.

One number — the Brier score

A curve is a diagnosis; sometimes you want a single grade. The Brier score is the standard one. Write each outcome as $o_t = 1$ if the event happened and $o_t = 0$ if it didn't, and let $p_t$ be the price as a probability. The score is the average squared miss:³

$$ \text{BS} \;=\; \frac{1}{N}\sum_{t=1}^{N}\left(p_t - o_t\right)^2 $$

It runs from 0 (you said 100% and were right, every time) to 1 (you said 100% and were wrong, every time). The number that matters is 0.25 — what you score by saying "50%" to everything, the forecast of someone who knows nothing. Beat 0.25 and you're adding information; liquid markets beat it comfortably.

The squaring is the soul of it. Say 80% on something that happens and you eat $(0.8-1)^2 = 0.04$. Say 80% on something that doesn't, and you eat $(0.8-0)^2 = 0.64$ — sixteen times the penalty. Confidence is cheap only when you're right; the Brier score makes you pay for bluster.

The Brier score, illustrated — markets land well left of the no-skill line.

Why the market stays honest

Why are markets so well-calibrated? The same reason everything in this series comes back to: money. If a market reliably said 70% for things that happen half the time, that gap is free money — you sell at 70¢, collect more often than you pay out, and your selling drags the price back toward the truth. Miscalibration is a profit opportunity, and a liquid market competes it away. A pundit can be overconfident for a whole career; a trader is overconfident only until they're broke.

What calibration won't tell you

Calibration is necessary, not sufficient. A forecaster who just parrots the base rate — "8% of startups become unicorns, so 8% for every startup" — is perfectly calibrated and perfectly useless. Good forecasting needs calibration and sharpness: confident, differentiated calls that still land on the line. The Brier score actually splits into exactly those two pieces.⁴

Markets have one well-known wrinkle, too: the favorite–longshot bias. Longshots tend to be priced a touch too high and heavy favorites a touch too low — buy a basket of 5¢ lottery-ticket contracts and you'll slightly underperform their price.⁵ It's small in deep markets, but it's real — a reminder that "the price is a probability" is an excellent approximation, not a law of physics. And it holds for the sports contracts that make up most of the volume just as much as for elections: a calibrated number is a forecast even when people are trading it for the thrill.

So "a price is a probability" was never a slogan. It's a claim you can test — and the market passes it better than almost any expert, pundit, or poll, because it's the only forecaster that goes broke when it's wrong. Next time a number flashes on a screen — 73% — you know what it means, and you know exactly how you'd check it. That is what separates a market from an opinion: the number is accountable.

Notes

The price-equals-probability identity is developed in Parts 1–3 of this series — What is a prediction market?, Prediction markets, explained, and Why prediction markets matter.
On the gap between expert confidence and actual accuracy, and the relative calibration of aggregated/market forecasts: Philip Tetlock, Expert Political Judgment (2005); and reliability studies of the Iowa Electronic Markets.
The Brier score: Glenn W. Brier, "Verification of Forecasts Expressed in Terms of Probability," Monthly Weather Review (1950). The 0.25 figure is the score of a constant 50% forecast on binary events.
The Brier score decomposes into reliability (calibration), resolution (sharpness), and uncertainty: Allan H. Murphy, "A New Vector Partition of the Probability Score" (1973).
Favorite–longshot bias — a long-documented regularity in betting and prediction markets; see Snowberg & Wolfers and the broader market-microstructure literature.