The Token Panic

There is a popular argument right now that AI costs are about to catch up with everyone. Token prices are subsidized. Current pricing is fake. Eventually the real cost of intelligence will show up.

Fresco-style panorama of small workshops powered by a central low-watt thinking machine.

Some of that is true. Reasoning models do a surprising amount of hidden work before they respond. A cheap model can get expensive if it takes too many steps to finish. Providers will keep changing the packaging because the current packaging does not work.

But that is a pricing problem, not an intelligence problem.

The number that matters long-term is not the price of one token. It is how much useful work you get for a dollar, a watt, or a machine you can actually afford to run. That number keeps improving. Models are getting more capable at smaller sizes. Training is more efficient. Inference wastes less memory and less time. Hardware is being designed specifically for this workload. The whole stack is converging on more intelligence per unit of energy.

That does not mean every product gets cheaper every month. A subscription can get worse. A provider can raise prices. A frontier model can cost more because it is genuinely doing more work. But underneath all of that, the machinery keeps improving. Tasks that required the best model last year become routine. Work that required the cloud moves local. Work that required a team becomes something one person can direct.

The panic treats today's token price like it is the underlying commodity. It is not. The commodity is useful thought. And useful thought per dollar is going up.

There is another assumption baked into the panic that seems wrong: that everyone will always need frontier models.

For some things, yes. The hardest research problems will use the best available intelligence. Large institutions will need frontier systems for problems that are too broad, too high-stakes, or too complex for anything smaller. There will probably always be a top tier that is expensive, scarce, and mostly cloud-based.

But most economic work is not frontier work. Most useful work is scoped. A company needs help with its own documents, its own customers, its own codebase. A person needs to turn their own judgment into more output. A business needs a system that understands the specific shape of their work well enough to make the next step easier. That does not always require the smartest model in the world. It requires enough intelligence, enough context, and enough reliability.

We will be honest about where we are on this. We still depend on frontier cloud models for the hardest parts of our own work. Local models are not good enough yet for everything we need. But we feel less anxiety about frontier models staying cloud-only than we used to. As smaller models get denser and more capable, more real work falls below the frontier line. Not all of it. Not overnight. But enough to matter.

At some point, probably sooner than most people expect, a local or cheaply hosted model will handle a large share of scoped business work. It will understand enough code. It will follow instructions well enough. It will use tools well enough. It will hold enough context. It will be cheap enough to leave running. That is a fundamentally different world from one where every meaningful act of building requires renting the top cloud model.

The frontier can stay expensive while the floor rises. Both things can be true at the same time.

This also means total AI spending can increase while the unit cost of intelligence drops. If a useful workflow gets cheaper, people run more workflows. They automate things they used to ignore. They explore ideas that were not worth staffing. They ask for more because asking for more becomes rational. Rising total spend is not proof that costs are out of control. It is what happens when a tool gets useful enough to absorb more demand.

The practical response is not to ignore cost. It is to build systems that assume the model market will keep moving. Use the expensive model where it matters. Use cheaper models where they are good enough, and test that assumption regularly, because what counts as good enough keeps expanding. Cache what you can. Keep workflows portable. Own the things around the model — the data, the context, the evaluations, the handoffs. The model is a commodity input. Everything around it is yours.

The mistake is building around a temporary pricing quirk. The other mistake is assuming a temporary pricing quirk tells you the long-term price of intelligence.

We think most people and most companies will be able to do more for the same money over time. More software, more analysis, more automation, more small internal systems, more individual leverage. The very top may stay expensive. The cloud frontier will remain its own market. That does not change the bigger picture.

The frontier is not the whole economy. For most people, what matters is the floor. And the floor is rising.