The argument in four lines
- The hourly rate everyone compares (managed runtime vs a VM) is the smallest number on the page.
- Tokens are the real bill, and you pay the same tokens whoever owns the server.
- The thing that actually decides it is who runs the box at 2am.
- So self-hosting pays off later than the spreadsheet says. Usually much later.
There is a particular kind of cost analysis that I have watched go wrong the same way a dozen times. Someone opens a spreadsheet, puts Anthropic’s managed-agent runtime in one cell ($0.08 per session-hour) and a cheap cloud VM in the next ($0.0168 an hour for the smallest box), and concludes that self-hosting is roughly five times cheaper. The arithmetic is correct. The conclusion is close to worthless, because the two cells they compared are the two cells that barely matter.
I want to do the version of this that holds up. Real prices, checked against the providers in June 2026, and an honest answer to the only question worth asking: at what point does running your own agent infrastructure actually beat paying someone to run it? There is a crossover. It is just nowhere near where the hourly rate suggests.
A quick note on what this is not. I have written before about whether to self-host or use a managed agent as a governance decision, and separately about what Anthropic’s managed agents actually are as a product. Both of those deliberately set cost aside. This is the cost piece they kept pointing at. If your data rules already force self-hosting, read the governance one first, because no price beats “not allowed.”
Related reading
This is the cost half of a set. What managed agents are covers the product. Self-hosted vs managed is a governance call covers what your data rules allow. Read those first if you have not decided whether you even can self-host.
The cost question the governance post skipped
Build versus buy, for ordinary software, really does turn on price. A managed database and a self-hosted one store the same rows, so you pick on cost and effort and move on. Agents break that habit, and they break it in a way that matters for the money.
An agent is not one cost. It is at least three, stacked, and they are wildly different sizes. There is the model usage (the tokens the agent burns thinking and writing). There is the runtime (the compute the agent occupies while it works). And there is the operations cost (the human time to keep the whole thing patched, secured, credentialed, logged, and alive). When people say “let us price out self-hosting,” they almost always mean the middle one. The middle one is the sliver.
Actually, let me back up, because this is the move that fixes the whole analysis. Tokens are billed identically no matter where the agent runs. Anthropic charges the same per-token rate whether the loop executes on its infrastructure, on a VM in your own cloud account, or on a Raspberry Pi under your desk. So tokens cancel out of any managed-versus-self-hosted comparison. They are large, but they are a wash. What is left to actually compare is runtime against operations, and that is where the surprise lives.
What you actually pay on each path
Here are the numbers, current as of June 2026, for a small two-vCPU class of machine in a US region. I have left tokens out of every row on purpose, for the reason above.
| Path | Runtime cost | If it runs 24/7 | Who patches it |
|---|---|---|---|
| Anthropic managed agents | $0.08 per session-hour, billed to the millisecond, only while running | ~$58/mo | Anthropic |
| AWS t4g.small (Graviton) | $0.0168/hr flat, idle or busy | ~$12/mo | You |
| Azure B2s | $0.0416/hr flat | ~$30/mo | You |
| Azure D2s_v5 (production-grade) | $0.096/hr flat | ~$70/mo | You |
| Google e2-small | $0.0168/hr flat | ~$12/mo | You |
| AWS Bedrock AgentCore | $0.0895/vCPU-hr + $0.00945/GB-hr, active only | varies | AWS |
| Google Vertex Agent Engine | $0.0864/vCPU-hr + $0.0090/GB-hr, free tier first | varies | |
| AWS Fargate (serverless) | $0.04048/vCPU-hr + $0.004445/GB-hr | ~$36/mo | AWS |
| Google Cloud Run (serverless) | $0.000024/vCPU-second, only during a request | cents, if bursty |
A few things fall out of this table that the usual comparison never reaches.
The managed runtime is metered. You pay $0.08 only for the wall-clock seconds the agent is genuinely running; idle time, waiting-for-you time, and finished time are free. A VM is the opposite. You rent it by the hour whether it is grinding or asleep, which is why an always-on cheap box and a busy managed agent can land in the same neighbourhood despite the 5x sticker gap.
The cloud-native agent services are the quiet trap here. AWS Bedrock AgentCore and Google Vertex Agent Engine look like Anthropic alternatives, and they are priced in the same ballpark per active hour. But they still bill you the same tokens on top, so they save nothing on the part that costs the most, and they pull your agent off the Claude platform you were probably already standardized on and into a second vendor’s tooling. I keep going back and forth on whether to even recommend them, and I land on: only if you are already deep in that cloud for other reasons. This is close to the feature-lag and premium story I dug into for Claude on Vertex versus the native API, where the managed convenience came with a tax you felt later.
And if your work is bursty (an agent that wakes up, does ninety seconds of work, and sleeps), serverless container runtimes like Cloud Run are almost free, because they only charge during the request. That is a real option people forget exists between “managed agent” and “my own VM.”
For the token side of the bill (the part I keep insisting dominates), I have written separately on the unit economics of generative AI, which is where the money in any agent program actually goes.
Where the crossover really sits
Now the question with a real answer. If you only look at runtime, when does an always-on VM get cheaper than a metered managed agent?
It is a straight line against a flat line. Managed runtime per month is about $0.08 times the hours per day the agent actually runs, times thirty. A VM is a flat monthly cost no matter what. They cross where the agent is busy enough to out-spend the rental.
Against the cheapest burstable box, the lines cross at about five active hours a day. Against a normal mid-size VM, around thirteen. Against a production-grade machine, a managed agent running every minute of every day is still cheaper. So even on the runtime layer alone, the “5x cheaper” claim only holds for an agent that sits almost entirely idle on an almost-free box. Push the agent harder, or size the box realistically, and the gap closes or flips.
But here is the part that the line chart cannot show, and it is the whole point. That chart is a fight over the sliver. Remember the tokens. Anthropic’s own worked example puts a one-hour coding session at about seventy cents, of which the runtime is eight cents. Eight cents out of seventy. The runtime is eleven percent of the bill, and that eleven percent is the only part self-hosting can touch. You can win the runtime fight outright and still have moved barely a tenth of your actual spend.
OK, so here is where it gets interesting, and where I think most people stop too early. There is a real economic case for self-hosting, and it is not the runtime rate. It is packing. One VM can hold many agents at once. Ten light agents sharing a single box cost a tenth of that box each, and now the per-agent number genuinely undercuts managed. The catch is that you only get the packing benefit once you have a crowd of agents to pack. A single agent on its own VM is just an expensive agent with chores.
Which brings up the chore. Somebody has to run that box.
Take five agents, each busy about four hours a day. On managed, the runtime is around $48 a month. Self-hosted, you can pack all five onto one $30 VM, so the compute is actually cheaper. And then a person has to patch the OS, rotate the credentials, wire up the secret vault, ship the logs somewhere you can query them, and be reachable when an agent wedges at an unsociable hour. Call it two hours a week at a loaded engineering rate. That is roughly $680 a month, and it dwarfs everything else on the page. The $18 you saved on compute is gone many times over before lunch.
That operations work is not optional and it is not one-time. It is the standing cost of owning the thing, and it is the same work whether you are running one agent or fifty. I have gone deep on what that ongoing burden looks like in building reliable AI agents; the short of it is that a self-hosted agent platform with no clear owner does not stay reliable, it slowly rots. In building Tallyfy, the cloud bills I could predict. The one that surprised me, every time, was the human time to keep the machinery honest. Nobody puts that cell in the spreadsheet, and it is the cell that decides the answer.
So the real crossover is not five hours a day of one agent. It is the point where you have enough steady agents to pack a box densely, and enough scale that someone is already doing the operations work for other reasons, so the marginal cost of one more agent is close to zero. Below that, managed wins on total cost even though it loses on the sticker. Above it, self-hosting wins, and the win compounds.
The four things the spreadsheet leaves out
Money is the easy axis. These four decide whether a self-hosted agent works at all, and in conversations I have had with teams pricing this out, they are what actually sink the plan.
The first is access. An agent is useless until it can reach the systems it acts on: the database, the CRM, the file share, whatever it is meant to touch. That means real credentials, scoped tightly, held in a vault, ideally one short-lived service identity per agent rather than one shared key that can do everything. This is fiddly on any platform, but it is the part where self-hosting earns its keep, because the secrets and the access never leave your perimeter. It is also the part where, done lazily, you hand a looping program your production keys and hope.
The second is observability. When something goes sideways, can you reconstruct what the agent did and what it touched? On a managed runtime you get the session record the vendor exposes, and no more. On your own infrastructure the logs are yours by construction, flowing into the same tooling you already use for everything else. For a workload that will face an auditor, that difference can outweigh the entire cost question on its own.
The third is the one people genuinely do not see coming: interactivity. A lot of agent work assumes a human is there to answer a mid-run question. Anyone who has used an interactive coding agent knows the rhythm, it stops and asks you to pick an option, you pick, it continues. Now take that same agent and schedule it to run headless at 3am with nobody watching. What happens when it hits the point where it would have asked? If you have not designed an answer (a default, a safe pause, a handoff to a queue), the agent either stalls forever or guesses, and guessing is how a fast start becomes an incident. This is not a hosting-cost question, but it is the thing most likely to make a “we will just run it ourselves” plan fall over in week two.
The fourth is the workload shape, the honest “where does this not work” list. Managed agents are a poor fit for a single quick prompt-and-response, because you pay session overhead for something a plain API call does in one shot. They are a poor fit for an agent whose core logic is unusual, because the managed harness is opinionated and an odd agent spends its life fighting those opinions. And self-hosting is a poor fit for a team that does not already run infrastructure, because you are signing up for the 2am pager, not just the VM. Where each shines is the inverse: managed for ordinary, long-running, mostly-unattended jobs you would rather not babysit; self-hosting for a fleet, for hard data rules, or for a loop weird enough that you need to own the machinery.
How to actually decide
Forget the hourly rate. It is the wrong starting line. Decide on shape first, and let the shape pick the path.
If you are running a handful of agents, use managed, and stop optimizing. The runtime cost is small, the operations cost is zero on your side, and the engineering hours you would spend building and babysitting your own harness are worth far more pointed at the product. When I explain this to people who are sure self-hosting will save them money, the question I ask back is: who owns the box in two years? If the answer is a shrug, the savings were never real.
If you are running many agents but they are not yet busy around the clock, look at serverless containers before you look at a VM fleet. Cloud Run and its cousins charge during the request and nothing the rest of the time, which fits a swarm of light, bursty agents far better than either a metered managed session or an always-on rental.
And if you are running many agents at high, sustained utilization, that is the moment self-hosting earns its place. Pack them densely onto reserved or ARM instances, take the committed-use discounts, and lean on the operations capability you already have at that scale. The crossover is real. It just sits at fleet scale, not at agent number one. The more I look at it, the more I think the entire managed-versus-self-hosted cost debate is people arguing about the eight cents while the seventy dollars and the human on call decide the outcome. Price those two honestly and the answer usually picks itself.





