Calculate exact savings from prompt caching on Claude, GPT-5, and Gemini. Enter your system prompt size and daily requests — see monthly savings instantly.
Select provider · Enter prompt size and volume · See exact monthly savings
How much cached tokens cost vs standard input tokens. June 2026.
| Provider | Model | Standard Input | Cache Write | Cache Read | Discount |
|---|---|---|---|---|---|
| Anthropic | Claude Sonnet 4.6 | $3.00/1M | $3.75/1M | $0.30/1M | 90% OFF |
| Anthropic | Claude Haiku 4.5 | $1.00/1M | $1.25/1M | $0.10/1M | 90% OFF |
| Anthropic | Claude Opus 4.8 | $5.00/1M | $6.25/1M | $0.50/1M | 90% OFF |
| OpenAI | GPT-5.4 | $2.50/1M | auto | $1.25/1M | 50% OFF |
| OpenAI | GPT-5.4 mini | $0.75/1M | auto | $0.375/1M | 50% OFF |
| OpenAI | GPT-5.4 nano | $0.20/1M | auto | $0.10/1M | 50% OFF |
| Gemini 3.5 Flash | $1.50/1M | $1.00/1M | $0.375/1M | 75% OFF | |
| Gemini 3.1 Pro | $2.00/1M | $1.00/1M | $0.50/1M | 75% OFF |
Anthropic's 90% cache read discount is the most aggressive in the industry. For workloads with a large static system prompt (RAG context, agent instructions), switching from OpenAI to Anthropic just for caching can cut prompt costs by 3× vs OpenAI's 50% discount.
Real-world scenarios and monthly savings estimates.
| Use Case | Cached Tokens | Requests/day | Without Cache | With Cache (Sonnet) | Saves |
|---|---|---|---|---|---|
| RAG with large doc corpus | 8,000 | 1,000 | $720 | $81 | $639/mo |
| Agent with long instructions | 3,000 | 500 | $135 | $18 | $117/mo |
| Customer support bot | 1,500 | 2,000 | $270 | $36 | $234/mo |
| Code review tool | 500 | 200 | $9 | $3 | $6/mo |
On Anthropic, you pay 25% extra for cache writes but 90% less for reads. Break-even is at 1.4 reuses of the same prefix — any more than 2 requests with the same system prompt and caching saves money. On OpenAI, caching is automatic (no write cost) so it's always beneficial.
Prompt caching lets you reuse the beginning of a prompt (system prompt, document context) across multiple requests. Instead of re-processing the same tokens every call, the provider caches the KV state. Anthropic charges 90% less for cached tokens ($0.30/1M vs $3.00/1M on Sonnet). OpenAI charges 50% less. Google charges 75% less for Gemini cached tokens.
Anthropic prompt caching: cache write costs 25% more than standard input ($3.75/1M on Sonnet), but cache reads cost 90% less ($0.30/1M on Sonnet). Break-even is typically 2 requests using the same prefix. For 10+ requests with the same system prompt, savings are 80–88%.
Yes. OpenAI automatically caches prompt prefixes longer than 1024 tokens at 50% discount — no code change needed. GPT-5.4 cached input drops from $2.50/1M to $1.25/1M. Cache TTL is a few minutes for active sessions.
Caching helps only when the same prefix is reused. If every request has a unique system prompt, caching has no benefit. It also has minimal effect for short prompts under 512 tokens. Best use cases: RAG with a fixed document corpus, agents with a long static system prompt, and multi-turn conversations.