AI & LLM Economics

AI Build vs Buy Calculator (API vs Self-Host)

Enter your hosted-API price per million tokens, the fixed monthly cost of running the model yourself, and how many tokens you expect to process each month. The calculator shows both monthly bills side by side, tells you which is cheaper at your volume, and computes the break-even usage where the two lines cross. It is a first-order cost screen, not a full total-cost-of-ownership model.

Costs & usage

Mtok = millions of tokens. Self-hosting is treated as a fixed monthly cost (GPU + ops); the API scales per token.

Break-even volume

150.00 Mtok/mo

Hosted API is cheaper at 100 Mtok/mo — saves $500.00/mo

Hosted API / mo

$1,000.00

Self-host / mo

$1,500.00

Cost detail
Hosted API monthly$1,000.00Self-host monthly$1,500.00Break-even volume150.00 Mtok/moCheaper optionHosted API is cheaper

Monthly cost at 100 Mtok

Hosted APISelf-host

How it works

The hosted-API cost is purely usage-based: it equals your monthly token volume (in millions of tokens, Mtok) multiplied by the provider's price per million tokens. Every extra request costs more, but you pay nothing when idle and you carry no infrastructure. This is the classic pay-as-you-go curve — a straight line that rises with volume from an origin of zero.

The self-hosting cost is treated as a fixed monthly figure — GPU rental (or amortized hardware) plus the operational overhead of keeping the model serving. Within the range you are considering it does not move with token volume, so on the chart it is a flat horizontal line. Reserved GPU capacity is paid whether you send it one request or a billion, which is exactly why heavy, steady workloads tend to favor owning the compute.

The break-even volume is the fixed self-hosting cost divided by the API price per million tokens: below that usage the pay-as-you-go API is cheaper, above it the fixed self-hosting cost wins. At exactly the break-even point the two monthly bills are equal. The calculator reports both costs at your entered volume, the cheaper option, the monthly difference, and the crossover point so you can see how far your usage sits from the tipping line.

Frequently asked questions

Why is self-hosting shown as a fixed cost and the API as per-token?+

It reflects how each option is actually billed. A hosted API charges per token processed, so its cost scales directly with usage and drops to zero when you are idle. Self-hosting means you reserve or buy GPU capacity that is paid for whether it is busy or not, so within a given serving setup the cost is effectively fixed each month. That structural difference — variable versus fixed — is what creates a break-even volume, and it is the core relationship this calculator models. Real deployments have some variability on both sides, but the fixed-versus-per-token framing captures the dominant dynamic.

What hidden costs of self-hosting does this leave out?+

Quite a few, and they usually push the true break-even higher than the raw GPU rental suggests. Self-hosting carries MLOps and engineering time (deployment, autoscaling, monitoring, upgrades, security patching), on-call and reliability burden, redundancy for high availability, idle capacity you provision for peak load but rarely use, and model-update work each time a better open model ships. The fixed figure you enter should be your all-in monthly cost including staff time, not just the sticker price of the instance. If you only count the GPU bill, self-hosting will look cheaper than it really is.

Does this account for quality, latency, or model capability differences?+

No — it compares cost only, and assumes the two options meet your requirements equally, which is rarely exactly true. A frontier hosted model may deliver higher answer quality, lower latency at low volume, and instant access to the newest releases, while a self-hosted open model gives you data control, customization, and predictable spend. Those are real trade-offs a pure cost number cannot capture. Use this tool to understand the economics, then weigh quality, latency, privacy, and maintenance capacity separately before deciding.

Related tools

Sources