AI data for the physical world

The last decade of AI ran on the internet — a free, pre-recorded copy of human thought. The physical world was never written down. Recording it is the next platform — and unlike the internet, it has a map.

Every breakthrough of the last decade of AI was paid for in advance. The internet — thirty years of text, images, and video, posted by billions of people for their own reasons — was a civilization-scale dataset that nobody set out to build. Crawlers scooped it up for almost nothing, and the models ate all of it. We are now scraping the bottom of that barrel; the frontier labs talk openly about a data wall.

The physical world offers no such gift. There is no internet-sized record of a hand folding a shirt, a forklift threading a crowded port, or an arm sorting bruised fruit — because none of it was ever written down. The single richest input to intelligence, acting in the world, is the one input we never logged. So the binding constraint on physical-world AI is not architecture and not compute. It is data that does not yet exist and has to be manufactured, one example at a time.

That much is becoming consensus. Here is the part that isn't: unlike the internet, physical data has a geography. You can train a language model in any city with electricity. You can only train a robot where there is work to record — and that map looks nothing like the one the internet drew.

The internet was a copy of human thought we got for free. The physical world is a recording we have to make ourselves — one factory floor at a time.

There is no internet for the body

Scale was the whole story of the language-model era. Frontier systems trained on the order of ten trillion tokens of text plus billions of images — a corpus that large precisely because it was a byproduct. Nobody wrote the web for the machines; people wrote it for each other, and the machines inherited it for the cost of a crawl.

Nothing comparable exists for the body. The largest open robotics dataset, Open X-Embodiment, had to be painstakingly assembled — sixty-plus labs pooling demonstrations across twenty-two different robot types just to reach a few million episodes.⁴ Stacked against the web, that is a rounding error. And the richest, most valuable data — contact, friction, balance, the ten thousand ways a real task goes wrong — lives only in the physical world, where almost none of it has ever been captured.

Illustrative, not to scale — at true scale the second bar would vanish. The web was a byproduct; physical data was never logged.

You can't crawl it — you have to capture it

The web was free because it was a byproduct. Physical data has no byproduct: every example has to be produced on purpose, by someone, in real time. The gold standard is teleoperation — a human puppeteers the robot through a task while every joint angle and gripper position is recorded. It works, and it is brutally slow. A usable dataset for a single new manipulation task runs roughly 10 to 100 operator-hours, and scaling a robot dataset tenfold costs millions, not thousands.² There is no crawler for this. The economics are the inverse of the web's:

$$ \underbrace{\text{Cost}_{\text{web}} \;\approx\; \frac{C_{\text{crawl}}}{N}}_{\to\, 0 \text{ as } N \to \infty} \qquad\quad \underbrace{\text{Cost}_{\text{physical}} \;\approx\; c \cdot N}_{\text{linear, forever}} $$

One crawl yields effectively unlimited examples, so the web's cost per example falls toward zero. In the physical world you pay a human wage for every demonstration, forever — the marginal example never gets free. That asymmetry is the whole problem, and the field is attacking it three ways at once: capture it more cheaply (teleop rigs, plus passive human video — Apple's EgoDex reads hands and fingers straight from egocentric footage³); pool it across labs and robot bodies (Open X-Embodiment); and generate it (world models like NVIDIA's Cosmos 3 synthesize action data and photoreal scenes to shrink the sim-to-real gap⁵). As one builder put it, the competition is no longer between robots or models — it is between data infrastructures.¹

The part the consensus misses — physical data has a geography

Internet data is placeless. A token scraped in Jakarta is identical to one scraped in San Francisco, so the entire language-model race could be run from a handful of GPU clusters anywhere on Earth. Physical-interaction data breaks that symmetry completely. It can only be recorded where the physical work is happening — and the physical work is not evenly distributed.

It is concentrated, heavily, in Asia. In 2024, Asia absorbed 74% of all new industrial-robot installations; China alone took 54% — some 295,000 machines — on an operational base above two million robots, roughly four and a half times Japan's. South Korea runs the highest robot density on the planet: 1,220 robots for every ten thousand workers.⁶ The machines that most need physical data, and the human work they could learn from, sit overwhelmingly on factory floors nowhere near the Bay Area.

Where the robots — and the work they learn from — actually are. Source: IFR, World Robotics 2025.⁶

We have seen this movie before, at smaller scale. AI's last data boom — labeling, ranking, RLHF, content moderation — quietly offshored to the Global South the moment it became a cost game. The data behind American frontier models was cleaned and rated by workers in Nairobi, Manila, and Hyderabad.⁷ But that was data you could at least move over fiber. Physical capture can't be moved at all: you cannot relocate a factory floor to a cheaper time zone, and you cannot pipe the act of bending metal across an ocean. The advantage accrues to whoever is physically present where the work — and the skilled-but-affordable operators — already are.

The asymmetry, in one line

You can train a language model anywhere there's electricity. You can only train a robot where there's work.

Why geography compounds

Presence turns into a flywheel. Sit where the work is, and you capture interaction data at the source — continuously, as a byproduct of operations that were happening anyway. Better data trains better models; better models earn more deployments; more deployments capture more data. A competitor an ocean away cannot crawl that loop. To match it they have to show up in person — and by then you are a generation of data ahead.

This reframes a story usually told about supply chains. The "China + 1" shift — factories spreading into Vietnam, India, and Mexico — is also a map of where physical-AI data will be born. The same density of hands-on work that made these places manufacturing hubs makes them the natural capture sites for the data that trains the next generation of machines. (It is, candidly, part of why building from Southeast Asia looks to me like an advantage rather than a handicap.) And the capital has started to notice:

$27.6B

Robotics & physical-AI VC, 2025 · 2× the prior year

$2.4B → $11B

Physical Intelligence's valuation, ~16 months

74%

of 2024 robot installations, in Asia

The money is betting the bottleneck is solvable.⁸⁹

What would make me wrong

The strongest counter to all of this is synthetic data. If world models get good enough, you don't record the factory floor — you generate it. NVIDIA's Cosmos 3 already produces action data and rare-event scenes on demand, precisely to cut how much real-world capture you need.⁵ Push that far enough and geography stops mattering: you manufacture reality in a data center, anywhere.

I don't think it gets there alone — at least not soon. Synthetic data still has to be grounded in, and validated against, the real thing, and the sim-to-real gap is widest exactly where the value is highest: dexterous, contact-rich, gloriously messy real environments that are the hardest to fake. Real capture stays the scarce, grounding input — and its geography holds. The honest caveats cut the other way, too: if capture migrates from factories to consumer wearables, everywhere becomes a capture site and the edge diffuses; and the labor questions from the last data boom were ugly enough that the next one deserves to be built better. But the central bet stands.

For thirty years the center of gravity in computing was wherever the data was — and the data was the internet, which belonged to no one and everyone. The next dataset is being recorded right now, by hand, in the places where the physical world actually gets work done. The winners in physical AI will be defined not only by their models and their GPUs, but by where they stand when the recording happens. That map is being drawn now, and it does not look like the internet's.

Notes

The framing that the real competition is "between data infrastructures": Michael Zhang, "Physical AI's Next Bottleneck Is Not Scale — But Data Infrastructure," Medium, 2026.
Teleoperation data economics — roughly 10–100 operator-hours per task variant, with ~10× dataset scaling running into the millions: IBM, "The data gap that's holding back robotics," plus industry estimates.
Apple EgoDex — large-scale egocentric video dataset with 3D hand and finger tracking, released May 2025, as a passively scalable source of human-demonstration data.
Open X-Embodiment — a cross-embodiment robotics dataset assembled by 60+ labs across 22 robot types; pooled, heterogeneous data transfers better to new hardware.
NVIDIA Cosmos 3 (June 2026) — open physical-AI world-foundation model trained on ~20T multimodal tokens; generates action data (joint angles, gripper positions, trajectories) and photoreal scenes to reduce the sim-to-real gap. NVIDIA; Axios.
IFR, World Robotics 2025 — Asia took 74% of 2024 industrial-robot installations; China 54% (≈295,000 units) on an operational stock above 2 million; South Korea highest density at 1,220 robots per 10,000 employees.
Reporting on AI data-labeling, RLHF, and moderation work concentrated in the Global South (Kenya, the Philippines, India) via firms such as Scale AI, Sama, and iMerit, 2023–2025.
Physical Intelligence funding: $70M seed (Mar 2024) → $400M Series A at $2.4B (Nov 2024) → $600M Series B at $5.6B (Nov 2025) → ~$1B at >$11B (in talks, Mar 2026). Bloomberg; TechCrunch; Sacra.
Robotics & physical-AI venture funding ≈ $27.6B in 2025, more than double 2024's $13.7B. PitchBook, Q4 2025 Robotics & Physical AI VC Trends.