When AI trades the future

What happens when the traders setting the price aren't people? AI agents are already cheaper, faster forecasters than most humans — and prediction markets may become both their training ground and their scoreboard.

This whole series has treated a market price as a forecast made by people — a crowd of traders, each pushing a number with their own money until it settles on the truth. That picture is about to get more complicated. The traders are starting to include software. A language-model agent can read every filing, poll, and news wire before breakfast, hold a probability for ten thousand questions at once, and trade around the clock without sleeping, flinching, or getting bored. The interesting question is not whether AI can trade a prediction market. It's what a prediction market becomes when much of the trading is done by machines — and whether the number at the end still means what we've spent twenty parts saying it means.

I'll argue two things can be true at once. AI makes prediction markets better — deeper, broader, cheaper, sharper. And AI introduces a new failure mode the human crowd never had. Both follow from the same fact: an agent is a forecaster you can copy.

A market price was always a forecast. The new question is what it means when the forecaster is a machine you can run a thousand times.

AI is already a decent forecaster

Start with the capability, because it's the part people underrate. Forecasting future events is exactly the kind of task large models turned out to be unexpectedly good at: read a lot, weigh base rates, update on new evidence, output a number. Research over the past two years has put LLM agents head-to-head with human forecasters on real, unresolved questions — wiring them up to retrieve and summarize the news, then asked them for a probability. The better systems land in the neighborhood of a competent human crowd, and they do it for a few cents and a few seconds per question.¹ They are not oracles — they still trail the very best human superforecasters on the hardest questions, and they inherit every bias in their training data — but "roughly as good as the crowd, almost for free" is already an economic event.

An agent has three advantages a human trader can't match. It reads everything, so no public fact escapes it. It updates instantly, so a price moves the moment the world does, not when a person happens to look. And it never logs off, so the market is awake at 3 a.m. when the news breaks. Drop a few of these into a market and the immediate effect is the one every exchange wants: more participants quoting more often deepens the order book, tightens the spread, and pulls the price toward fair value faster. The thinnest, most-neglected markets — the ones a human crowd never bothered to price — are exactly where a tireless agent helps most.

Illustrative — humans and agents both quote into one book. More tireless quoters means more depth and a tighter spread.

Markets as the scoreboard for AI

Now flip the relationship around. We've spent this series insisting a price is a testable claim you can grade — gather a forecaster's calls, check how often the 70% ones come true, score the misses. That machinery doesn't care whether the forecaster is a person or a program. Which means a prediction market is, almost by accident, the cleanest benchmark we have ever had for forecasting skill: an open, adversarial, money-weighted test that resolves itself against reality and pays out to whoever was right.

Most AI benchmarks are static question banks that leak into the next model's training set and quietly rot. A market can't be memorized, because the answer doesn't exist yet — it's a question about the future. So "how good is this model at forecasting?" gets a brutally honest answer: turn it loose on a stream of open markets, record its probabilities, and at resolution compute its Brier score against the outcomes.² Calibration becomes the metric. A model that beats the no-skill line — and beats the market — is demonstrably adding information, and you can prove it in dollars.

There's a second, stranger version of this: markets for model quality. You don't only have to grade a model by a market — you can make a market about models. "Will model A beat model B on this eval suite next quarter?" is a perfectly good event contract: a crisp question, a future date, an objective resolution source. A price on it is the crowd's live probability that A wins, updating as benchmarks and leaks trickle out. We already let markets price elections and earnings; pricing which AI is better is the same move pointed at the thing the whole industry argues about.

The new failure mode — reflexivity

Here's the catch, and it's a real one. A human prediction market works because it aggregates independent guesses — thousands of people who looked at different things, know different facts, and are wrong in uncorrelated ways. The errors cancel and the signal survives. That's the entire mathematical reason the crowd beats the expert. Independence is load-bearing.

Agents threaten the independence. If a large share of the trading is done by models that were trained on overlapping data, fine-tuned in similar ways, and prompted with near-identical instructions, they don't make independent errors — they make the same error, together. A blind spot in the shared training data becomes a blind spot in the price. And because all that agreement looks like consensus, the market reports it as confidence — a tight, lopsided price that feels like the wisdom of crowds but is closer to one model, echoed a thousand times.³ Worse, agents can read the market and each other, so they start trading on what they think the other bots will do. The price stops aggregating knowledge about the world and starts reflecting models reading models — a hall of mirrors. That's the reflexivity risk: when the forecasters all watch the same screens and share the same blind spots, the number can be sharp and wrong at the same time.

Illustrative — independence is what makes the crowd smart. Agents trained alike can share a blind spot, and the price reports the agreement as false confidence.

What's hard, said plainly

So the honest frontier has three open problems, and none of them is solved:

Correlated-model risk. A monoculture of similar agents doesn't aggregate — it amplifies. The fix is genuine diversity: different base models, different data, different objectives, and enough independent humans in the mix to keep the errors from lining up. Diversity, not raw count, is what makes a crowd wise.
Adversarial agents. A bot that can read a market can be built to move one — spoofing depth, painting the tape, or feeding other models poisoned context to nudge their forecasts. We covered manipulation and self-correction back in Can you trust the price?;⁴ autonomous traders raise the speed and lower the cost of the attack, so surveillance has to get faster too.
Does a market of bots tell humans anything? The deepest one. If the price is just models reading models, it may be internally consistent and externally blind. A forecast is only worth something if, somewhere in the loop, it's anchored to reality — fresh evidence, human judgment, hard resolution. Cut that anchor and you've built a very liquid, very confident way to be wrong in unison.

The opportunity is the same size as the risk

I lead with the risks because they're real and underdiscussed — but the upside is genuinely large, and it's the reason to build through the hard parts rather than around them. An agent can make a market on anything: propose the question, seed both sides, quote a price, and resolve it. The bottleneck on prediction markets has never been demand for forecasts — it's the cost and effort of standing one up and keeping it liquid. Agents attack exactly that cost. Coverage that was uneconomic for a human crowd — every county race, every shipment ETA, every "will this ship by Q3" — becomes a market an agent can run for pennies.

Which points at the real prize: the cost of a forecast falling toward zero.⁵ For most of history a good probability on a specific future event was expensive — you hired analysts, ran polls, convened experts. If a calibrated agent can produce one for a few cents, forecasting stops being a scarce service and starts being ambient infrastructure, priced into software the way a map or a weather API is today. The market is what keeps that cheap forecast honest: it's the scoreboard that grades the agent and the mechanism that pays it only when it's right.

That, quietly, is the same bet I make about AI everywhere — that the frontier is less about bigger models than about where the data and the feedback come from. A prediction market is a rare machine that generates its own labels: reality shows up, the contract resolves, and every agent that traded it gets a graded answer for free. A forecaster that can be scored, paid, and corrected by an objective scoreboard is exactly the kind of loop that gets better on its own.

None of this is here yet — this is a bet about where the lines cross, not a status report. Agents are not yet the marginal trader on the big exchanges, the correlated-error problem is unsolved, and "a calibrated agent for pennies" is still mostly a research result, not a product. But the direction is hard to miss. The traders setting the price are starting to include machines; the markets they trade are starting to grade those machines back. A price was always the crowd's best guess about the future. Soon part of that crowd won't be human — and the work, the same as it's always been, is making sure the number still tells us something true.

Notes

On LLM agents as probabilistic forecasters approaching human-crowd accuracy when given retrieval over current news: Halawi et al., "Approaching Human-Level Forecasting with Language Models" (2024); and Schoenegger et al. on LLM forecasting tournaments and ensembles. The best systems still trail elite human superforecasters on the hardest questions.
A market price is graded by calibration and the Brier score — developed in Part 4 of this series, A price is a probability: Glenn W. Brier, "Verification of Forecasts Expressed in Terms of Probability," Monthly Weather Review (1950).
On correlated errors and "model monoculture" — when systems trained on overlapping data share failure modes, aggregation amplifies rather than cancels them: Kleinberg & Raghavan on algorithmic monoculture (2021); and the broader literature on homogenization in foundation models.
Manipulation, adversarial trading, insider information, and a market's self-correction are treated in Can you trust the price? — autonomous agents change the speed and cost of an attack, not the underlying mechanism. Forecasting/aggregation as the foundational case for markets is laid out in Part 1, What is a prediction market?
The "cost of a forecast toward zero" framing mirrors the economics of automation: when a unit of cognitive work gets cheap, it stops being a scarce service and becomes ambient infrastructure. Contrast the inverse asymmetry for physical-world data — linear cost, forever — in my AI thesis, AI data for the physical world.