Games Are the Best Long Horizon Task
The most capable AI agents won't be trained on enterprise workflows. They'll be trained on games.
Most model companies are pouring resources into enterprise task training: teaching agents to navigate spreadsheets, draft emails, build decks. Enterprises pay. But this is a local maximum.
On Professional Tasks
When is a PowerPoint done? You know when a chess game is over. Someone wins, someone loses, someone draws. But a report? A slide deck? You're done when your manager stops asking for changes, or when you run out of time. “It depends” is a terrible training signal.
Consider writing a report. You have a vaguely defined goal (“make it comprehensive”), an action space that's effectively infinite, and an end state determined by human judgment. The verifier is a person, and people disagree. One manager wants bullet points. Another wants narrative. A third wants both, but shorter.
Each intermediate action (choosing a header, deciding which data to include, formatting a chart) exists in this vast, fuzzy space where “correct” is subjective. You can verify that the report exists. You can't easily verify that it's good.
When this space is defined by human preferences rather than objective outcomes, you train models to imitate humans. Humans satisfice. We anchor on the first approach that seems reasonable. We optimize for looking productive over being productive.
Games Are Long Horizon
Games solve every structural problem that makes office tasks bad long-horizon training.
Verifiable end states. Chess, Go, Donkey Kong, StarCraft. You know when you've won. No ambiguity, no “let's circle back.” The signal is clean.
Well-defined action spaces. In chess, you have a finite set of legal moves at each position. In StarCraft, a large but bounded set of actions per tick. Complex enough to be interesting, structured enough to learn from. Compare that to “write the next paragraph of this quarterly report,” where the action space is essentially all of language.
Dynamic environments. Most interesting games involve another player, or at minimum, a world that responds to your actions. Office tasks are mostly static. The spreadsheet doesn't fight back.
Deep sequential decisions. A game of Go runs 150+ moves. StarCraft can be thousands of actions. Genuinely deep decision problems where early moves compound.
The Real Pareto
When DeepMind trained AlphaGo, it started by imitating human experts. The breakthrough came when it stopped imitating and started playing against itself. Move 37 against Lee Sedol, a move no human professional would have played and that commentators initially called a mistake, turned out to be brilliant. The model found something humans missed in thousands of years of play.
In games, you can train against true outputs. Not human-optimized outputs, but actually optimal ones. The objective function exists independently of human judgment. You don't need a person to tell you whether a move was good. The game tells you.
In office tasks, there is no “Move 37” for PowerPoint. The ceiling is always human taste, human convention, human approval. You can get very good at imitating the best humans. You cannot transcend them, because there's nothing to transcend toward.
Games provide that objective function. The abstract reasoning needed to play Go at superhuman levels (pattern recognition, long-term planning, evaluation of complex state) aren't game-specific skills. They're the building blocks of general intelligence.
Humans are inherently inefficient optimizers. RL training that takes human demonstrations as the target will produce agents that are, at best, efficient imitators of inefficiency. The next Pareto frontier requires training on ground truth. Games are the only domain complex enough to develop general reasoning and clean enough to provide it.
The Future Is in Games
Over the next few years, model companies will correctly keep training on professional tasks. There's massive value there.
The breakthroughs won't come from teaching a model to write better emails. They'll come from teaching one to win.