bonesso.io

The $10 Million Token Bill Is a Management Problem, Not an AI Problem

This week, Chamath Palihapitiya went on the All-In Podcast and described a situation that's about to become very common. His software startup, 8090, is "trending toward spending $10 million a year on AI costs." His costs have tripled since November. His revenues haven't kept up.

"Thank you to the VCs who will fund this all-you-can-eat token consumption," he wrote on X. His read: AI is being subsidized by venture capital and the real price hasn't hit yet.

The tech community reacted predictably. Skeptics treated it as confirmation that AI is unsustainably expensive. Consultants began drafting "AI cost governance" frameworks. CFOs started asking harder questions about those monthly inference bills.

All of them are missing the actual story.

The $10 million token bill isn't evidence that AI is too expensive. It's evidence that most engineering teams still don't know how to use it correctly.

1. What a Ralph Loop Actually Is

Palihapitiya gave the game away himself. He named the problem: "Ralph Wiggum loops."

The name comes from the beloved Simpsons character. In AI context, a Ralph loop means feeding the same prompt back into a model over and over until something eventually works. It's the AI equivalent of randomly clicking through a UI until a bug reproduces. Technically functional. Computationally catastrophic.

"Everybody has gotten infatuated with what we call these Ralph Wiggum loops," Palihapitiya said. "Just send the thing off and it'll figure something out. A, it never figures anything out. And B, you just get this ginormous bill."

Read that carefully. He identified the root cause in the same breath as the complaint. The cost problem isn't the models. It's the workflow built around them.

When a team points an agentic coding tool at a problem and says "go figure it out" without structure, without checkpoints, without human review gates, the result is exactly what Palihapitiya described: the model thrashes, consumes tokens at an astonishing rate, and produces results that require costly correction loops to fix. The bill arrives. The feature doesn't.

This is not an AI failure. It's an engineering failure with a very expensive audit trail.

2. The Infrastructure Analogy Everyone Gets Wrong

There's a persistent narrative that AI costs follow the same arc as ridesharing. VCs subsidize cheap rides to build habits, then prices rise once the market is captured. Palihapitiya invoked exactly this comparison.

It's the wrong analogy. The right one is compute.

Every decade for fifty years, compute costs have fallen by roughly an order of magnitude. This is not a law of physics. It's a consequence of relentless engineering investment in hardware, compilers, and architectures. The same forces are compressing AI inference costs, and they're doing it faster than compute ever did.

In 2023, running a capable language model required either an enterprise API contract or a server rack worth of GPUs. In 2024, capable models ran on MacBook chips. In 2025, distillation, quantization, and speculative decoding collapsed the cost of a frontier-grade inference call by 80%. In 2026, the competition between inference providers is so intense that prices are approaching commodity levels on mid-tier models.

The teams spending $10 million on AI today are almost certainly using models that are far more powerful than they need for most tasks. They're routing every request to the heaviest model in the stack out of convenience or habit, the same way engineers in 1985 used mainframe cycles for tasks that a workstation could have handled. The inefficiency wasn't an argument against computers. It was an argument for better software engineering practices.

3. Model Selection Is the New Query Optimization

In the database era, every senior engineer learned to read a query plan. You profiled your queries. You found the expensive ones. You rewrote them or indexed around them. Nobody argued that databases were "too expensive." They argued that careless queries were expensive, and they were right.

The equivalent skill for the AI era is model routing. Not all requests need a frontier model. A task that classifies customer support tickets, generates a boilerplate test case, or summarizes a known document format does not require the same inference horsepower as multi-file architectural refactoring. Routing these requests appropriately can cut inference costs by 60 to 80 percent with no meaningful drop in output quality.

This is not a novel concept. It's basic economic optimization applied to a new resource. But because the AI tooling ecosystem is young, most teams haven't built the instrumentation to see where their tokens are going. They're flying blind into a $10 million bill.

The solution isn't "use less AI." The solution is "profile your usage, route appropriately, and stop running GPT-5 on tasks that a smaller model handles at a fifth of the cost."

4. The Cursor vs. Claude Code Swap Tells the Real Story

Palihapitiya's most revealing complaint wasn't about AI costs in general. It was specific: "We need to migrate off of Cursor. It's just too expensive vs Claude Code. The latter is equivalent and if you use the Pro plan, you eliminate huge Cursor bills for token consumption."

Strip away the drama and what you have is a straightforward market signal. Two tools, equivalent capability, meaningfully different cost structure. The rational response is to switch, which is exactly what Palihapitiya announced he would do.

This is the competitive market working correctly.

Cursor built a beautiful product and captured developer mindshare early. Claude Code entered with a different pricing model tied to Anthropic's Pro subscription. The market now has data on which structure is more economical at scale. Builders will optimize accordingly.

That dynamic, multiple capable tools competing on price and capability, is the opposite of the captured-market ridesharing analogy. It's the early PC market. It's early cloud compute. It's what happens when infrastructure technology matures rapidly enough that the first movers face real competition before they can raise prices.

The teams that are building model-agnostic workflows right now, with abstraction layers that let them swap providers without rebuilding their toolchain, are positioning themselves to benefit from this competition indefinitely. Palihapitiya said it himself: "We need to gain more flexibility to swap between models without everything breaking." That's not a complaint about AI. That's a description of good architecture.

5. The Revenue Gap Is the Actually Interesting Problem

Here is the part of Palihapitiya's complaint that deserves real scrutiny, and that the AI critics are ignoring in favor of the cost narrative.

"My costs are going up 3X every three months. My revenues are not."

If 8090 exists to replace and rewrite legacy software in the world, and they have access to increasingly capable AI tools, the question worth asking is not "why are the tools expensive?" but "why aren't the outputs translating to revenue at the same rate?"

There are three plausible answers.

The first is that the tools are genuinely producing value, but the sales cycle for enterprise software replacement is long. The revenue will arrive, but it lags the investment. This is the optimistic read, and it may be correct.

The second is that agentic AI tools are excellent at generating code and less excellent at generating the right code, and the delta between "code was written" and "problem was solved well enough to charge for" is larger than the tools' proponents acknowledge. This is the honest read, and it also may be correct.

The third is that Ralph loops aren't just expensive. They're also producing outputs that require so much human review and correction that the productivity gains are being consumed by the cleanup, and no one has built good enough instrumentation to see this clearly.

None of these three answers lead to "therefore AI is too expensive." They all lead to "therefore the craft of using AI effectively is not yet widely distributed, and the teams that develop it fastest will have a durable advantage."

6. Cost Curves Always Favor the Technology

Here is the historical fact that gets lost in every "this technology is too expensive" moment.

When railroads were first built, they were expensive to operate. The early operators who couldn't manage fuel and maintenance costs went bankrupt. This was not an argument against railroads. The cost curves fell. Operational knowledge accumulated. The surviving operators prospered at a scale that would have been impossible without the technology.

When cloud computing launched, it was expensive compared to on-premises hardware for many workloads. CFOs balked. Procurement teams pushed back. Security teams invented obstacles. And then the cost curves fell, the tooling matured, and organizations that delayed adoption found themselves at a structural disadvantage to the ones that pushed through the awkward early years.

The AI inference cost curve is following the same trajectory, but faster. The reason is competition. OpenAI, Anthropic, Google, xAI, Mistral, and a growing list of open-source alternatives are all racing to deliver more capability per dollar. None of them can afford to let a competitor establish a dominant cost advantage. The result is a deflationary spiral in inference pricing that benefits every team that builds on top of these tools.

The $10 million token bill that Chamath Palihapitiya is paying today will not be a $10 million bill in two years. The same workload, run more efficiently on better-priced models, will cost a fraction of that. The teams that used this period to develop disciplined workflows, good model routing, and real measurement of AI output quality will be running those efficient workloads. The ones that waited will be starting from scratch.

7. What Disciplined AI Usage Actually Looks Like

The teams that are getting this right are not the ones spending the most or the least on AI. They're the ones that have built the habits of a disciplined engineering culture around a new class of tools.

They instrument their AI calls the way they instrument their databases. They know which features generate the most tokens, which prompting patterns produce output that requires the least revision, and which models handle which tasks at what cost-quality tradeoff.

They use agents for work that benefits from autonomy and iteration, and they use direct API calls for work that is well-defined enough to not need it. They don't run a frontier model on a classification task, and they don't give an underpowered model a refactoring job that requires understanding of a 50-file codebase.

They review AI output the way they review code from a talented junior engineer: not with blind trust, not with reflexive rejection, but with calibrated judgment about where the tool is reliable and where it needs supervision.

And critically, they update these practices as the tools change. What was true about model reliability in mid-2025 is not true now. What is true now will not be true at the end of 2026. The teams that treat their AI workflows as static configurations rather than living practices will fall behind, not because the tools got worse, but because the teams didn't adapt as the tools got better.

The $10 million token bill is a signal. But the signal isn't "AI is too expensive." It's "we don't yet have the engineering practices to match this technology's speed of improvement."

That gap is closable. The teams that close it fastest will build things the rest of the market won't recognize as possible.