The short version
Good-enough AI is commoditizing the model business from below. The moment a cheaper model clears the quality bar for a real job, the frontier model stops earning a premium on that job, and most jobs don't need the best model. The money stays with the hard problems: frontier reasoning, high-stakes work, long-horizon agents.
- Running a GPT-3.5-level model fell 280-fold in about 18 months, per Stanford HAI
- This is textbook low-end disruption, the pattern Clayton Christensen named
- The premium tier shrinks to the jobs where being wrong is expensive
Most jobs you’d hand an AI model don’t need the best model on Earth. They need one that clears the bar. Once a cheaper option clears that same bar, the price you’d pay for the frontier model on that job goes to roughly zero, because the extra quality changes nothing you can measure. That is the whole argument. Good-enough AI keeps getting cheaper, the bar keeps falling within reach of cheaper models, and the premium model business gets squeezed into the narrow band of work that really can’t tolerate a worse answer.
This isn’t a vibe.
It is commoditization from below, the oldest pattern in the disruption playbook.
Here is the number that makes it concrete. Running a model that scores at GPT-3.5 level dropped about 280-fold in roughly 18 months, per Stanford HAI, from around 20 dollars per million tokens in late 2022 to about 7 cents by late 2024. Tom’s Hardware reported the same figure. When a capability gets 280 times cheaper, it stops being a product you charge a premium for.
It becomes plumbing.
Why good enough wins the volume
Good-enough AI wins the volume because, for most real work, the marginal quality of the frontier model doesn’t move the outcome you can measure. Picture an AI summarizing a support ticket, drafting a meeting recap, tagging a document, or extracting an invoice total. The best model on the leaderboard and a year-old open-weight model both get those right. The reader can’t tell which one wrote the summary, and no downstream metric shifts based on which one did. So the only thing left to compete on is cost and speed, and on cost and speed the cheaper model wins by default. This is the heart of commoditization from below: when two options clear the same bar, buyers stop paying for the difference, because there is no difference they can feel.
The frontier model keeps its crown and loses the work. Most jobs are this kind of job, which is why the cheap tier captures most of the volume the moment it clears the bar.
Clayton Christensen called this low-end disruption decades ago, and his words map onto AI almost too cleanly. A cheaper, good-enough option takes root at the bottom of a market, then moves upmarket, and the incumbent keeps retreating to the higher-margin work because the bottom isn’t worth fighting for. His old example was steel. Integrated mills let the scrappy mini-mills have low-end rebar because rebar earned the mini-mills a 20 percent margin and the big mills only 7 percent there. Rational to walk away. Then the mini-mills climbed, grade by grade, until they were eating the premium steel too.
Swap rebar for ticket summaries and you have the AI model market in 2026.
I keep going back and forth on how fast this plays out, mind you. The counter-case is real and I’ll get to it. But the direction of travel is hard to argue with when the floor rises this quickly.
After 10 or so years building workflow tools, the thing I have learned to watch isn’t the demo. It is the boring high-volume task that runs ten thousand times a day. That task doesn’t care about a two-point bump on a reasoning benchmark. It cares whether the answer is right often enough and cheap enough to run at that volume. Cheap and right-enough beats brilliant-and-expensive every time the volume is high and the stakes per call are low. Most enterprise AI work is exactly that.
Follow the money up the stack
Not every job commoditizes. The premium doesn’t vanish.
It concentrates.
Three kinds of work still pay full freight for the best model available, and they’re worth naming because they’re where the model labs will end up living.
The first is frontier reasoning. Hard math and novel code, the multi-step problems where a wrong intermediate step poisons everything after it. Here the quality gap isn’t cosmetic. The best model finishes the proof and the good-enough one wanders off. People will pay for that gap because the cheaper option doesn’t actually clear the bar.
Second is high-stakes work where being wrong is expensive or dangerous. A medical triage suggestion. A legal clause your client relies on. When the cost of one bad answer dwarfs the cost of a thousand queries, you buy the best model and you don’t blink at the bill. The math flips. Suddenly the premium is cheap insurance.
Third is long-horizon agentic work. An agent that runs for an hour, takes forty actions, and has to stay coherent the whole way compounds tiny error rates into total failure. A model that’s 2 percent better per step is dramatically better over forty steps. That compounding is where the next premium hides, and it’s why every lab is racing toward agents rather than chat.
Everything else? Drifting toward free. The summarizing, the classifying, the routine extraction, the first-draft writing. That is the commoditized base of the market, and it’s enormous, and almost nobody will pay a premium for it within a couple of years.
So where does the money go when good-enough is everywhere? Up the stack and into the hard problems. The labs that survive will either own the frontier-reasoning and agent tier outright, or sell something other than raw tokens: tooling, distribution, trust, the workflow around the model. The token itself is on its way to being a commodity, priced like one.
What open weights do to the floor
They raise it, fast, which is the part that should worry anyone selling tokens. The reason good-enough keeps getting better isn’t only that the labs cut prices. It is that open-weight models now run on your own hardware, or on the edge, for the cost of the electricity. When the good-enough tier is also free to self-host, the premium provider loses the volume floor outright, because the customer doesn’t even pay them the 7 cents.
The convergence is measurable. On the Chatbot Arena leaderboard tracked by the 2025 AI Index, the best open-weight model trailed the best closed model by 8.04 percent in early 2024. A year later that gap had shrunk to 1.70 percent. The open tier isn’t catching the frontier, to be clear. But it doesn’t have to. It only has to clear the bar for the commodity jobs, and it cleared that bar a while ago.
This is the cleanest line between this argument and the buyer-side cost of ownership question, a different post about whether self-hosting actually saves you money once you count the ops staff. That is the buyer’s spreadsheet. This is the seller’s problem. Even if running open weights is a pain for the buyer, the mere existence of a free good-enough option caps what the seller can charge. The price ceiling is set by the cheapest thing that clears the bar.
A practitioner I read, writing under the handle UncoverAlpha, put the conclusion better than I can: most of the economy will not run on the best model. It will run on the cheapest model that’s good enough.
There is a real abstract version of this I keep bumping into. A company with a one-off question, the kind you ask once and never again, has zero reason to wire up premium tooling. Cheap and dirty, one shot, good enough, done. Multiply that by every routine question every company asks, and you get the commodity base of the market.
Will the frontier keep paying off?
Maybe. And this is where I might be wrong, so let me say it plainly. The whole argument assumes the bar for “good enough” stays roughly fixed while cheap models climb to meet it. If the bar keeps rising faster than cheap models can climb, the premium holds. New capabilities, agents that actually work end to end, reasoning that opens up jobs nobody could automate before, those reset the bar upward and re-create scarcity at the top.
So which is it? Both, at different speeds for different jobs. The commodity base commoditizes. The frontier sprints ahead and mints a new premium tier, which then commoditizes a year or two later, while a newer frontier opens above it.
It is a treadmill.
The premium business doesn’t die. It keeps having to run faster to stay in the same place, and the floor under it keeps rising.
Does that mean the model labs are doomed? No. It means the easy money, charging a premium for tokens that do ordinary work, is evaporating. What is left is harder and more interesting: be the frontier, own the agent layer, or sell the thing around the model that has actual unit economics. Raw inference is becoming a race to the bottom, and races to the bottom have one winner, the cheapest provider, and a lot of bruised egos.
I built Tallyfy on a similar bet years before any of this, that the boring repeatable work that runs at high volume is where the real value sits, not the flashy one-off. Watching the AI model market rediscover that lesson the expensive way has been quietly satisfying.
The best model is a brilliant thing. I use the frontier ones every day for the hard 10 percent of my work, and they earn their keep there. But the other 90 percent I throw at a model is mundane, and for the mundane stuff I cannot tell you which model did it. So I will not pay extra for the fancy one. Neither will you, once you check.
For most of what people do with these tools, brilliant isn’t what they’re buying. Good enough, cheap, and right often enough is the product. The rest is a rounding error, and rounding errors do not command a premium for long.





