Question 1

Why are input and output tokens priced differently?

Accepted Answer

Providers charge separately because generating output is more compute-intensive than reading input. Output (completion) tokens are usually several times more expensive than input (prompt) tokens — for example GPT-4o is $2.50 per million input tokens but $10.00 per million output tokens. That asymmetry is why this calculator reports the two costs separately: for chat and generation workloads that return long responses, output can dominate the bill even when your prompt is long, so optimizing prompt size alone may not cut costs as much as you expect.

Question 2

Does this include caching, batch, or volume discounts?

Accepted Answer

No. This is a straight list-price estimate. Most providers offer discounts this tool does not model: prompt caching (a reduced rate for repeated prompt prefixes), batch APIs (often around 50% off for asynchronous jobs), and negotiated or committed-use volume pricing. There are also extra-cost features — long-context surcharges, vision or audio tokens, and fine-tuned model rates — that fall outside a simple per-token estimate. Treat the figure here as an upper-bound sticker price; your real bill can be lower once discounts apply.

Question 3

How current are these prices?

Accepted Answer

The rates come from a versioned 2025 price table and are labelled accordingly throughout the tool. LLM pricing changes often — providers cut rates, launch new model versions, and retire old ones on a regular basis — so a table that was accurate when it was written can drift out of date within months. Before you budget or compare vendors, verify each rate against the provider's official pricing page (linked in the sources below). This calculator is for quick estimation and comparison, not a billing guarantee.

LLM API Cost Calculator (GPT-4o, Claude, Gemini)

How it works

Frequently asked questions

Related tools

Sources