Last updated: June 2026 · Current pricing

Prompt Caching Cost Calculator

Calculate exact savings from prompt caching on Claude, GPT-5, and Gemini. Enter your system prompt size and daily requests — see monthly savings instantly.

Advertisement
Key Takeaways

💾 Prompt Caching Savings Calculator

Select provider · Enter prompt size and volume · See exact monthly savings

Monthly savings
$0
— % reduction in prompt costs
Without caching / month
Cache write cost / month
With caching / month
Monthly savings
Annual savings

Prompt Caching Rates by Provider

How much cached tokens cost vs standard input tokens. June 2026.

ProviderModelStandard InputCache WriteCache ReadDiscount
AnthropicClaude Sonnet 4.6$3.00/1M$3.75/1M$0.30/1M90% OFF
AnthropicClaude Haiku 4.5$1.00/1M$1.25/1M$0.10/1M90% OFF
AnthropicClaude Opus 4.8$5.00/1M$6.25/1M$0.50/1M90% OFF
OpenAIGPT-5.4$2.50/1Mauto$1.25/1M50% OFF
OpenAIGPT-5.4 mini$0.75/1Mauto$0.375/1M50% OFF
OpenAIGPT-5.4 nano$0.20/1Mauto$0.10/1M50% OFF
GoogleGemini 3.5 Flash$1.50/1M$1.00/1M$0.375/1M75% OFF
GoogleGemini 3.1 Pro$2.00/1M$1.00/1M$0.50/1M75% OFF
💡 Anthropic has the best caching deal

Anthropic's 90% cache read discount is the most aggressive in the industry. For workloads with a large static system prompt (RAG context, agent instructions), switching from OpenAI to Anthropic just for caching can cut prompt costs by 3× vs OpenAI's 50% discount.

Advertisement

When Caching Saves the Most

Real-world scenarios and monthly savings estimates.

Use CaseCached TokensRequests/dayWithout CacheWith Cache (Sonnet)Saves
RAG with large doc corpus8,0001,000$720$81$639/mo
Agent with long instructions3,000500$135$18$117/mo
Customer support bot1,5002,000$270$36$234/mo
Code review tool500200$9$3$6/mo
📌 Break-even point

On Anthropic, you pay 25% extra for cache writes but 90% less for reads. Break-even is at 1.4 reuses of the same prefix — any more than 2 requests with the same system prompt and caching saves money. On OpenAI, caching is automatic (no write cost) so it's always beneficial.

Frequently Asked Questions

What is prompt caching and how does it save money?+

Prompt caching lets you reuse the beginning of a prompt (system prompt, document context) across multiple requests. Instead of re-processing the same tokens every call, the provider caches the KV state. Anthropic charges 90% less for cached tokens ($0.30/1M vs $3.00/1M on Sonnet). OpenAI charges 50% less. Google charges 75% less for Gemini cached tokens.

How much does Anthropic prompt caching cost?+

Anthropic prompt caching: cache write costs 25% more than standard input ($3.75/1M on Sonnet), but cache reads cost 90% less ($0.30/1M on Sonnet). Break-even is typically 2 requests using the same prefix. For 10+ requests with the same system prompt, savings are 80–88%.

Does OpenAI support prompt caching?+

Yes. OpenAI automatically caches prompt prefixes longer than 1024 tokens at 50% discount — no code change needed. GPT-5.4 cached input drops from $2.50/1M to $1.25/1M. Cache TTL is a few minutes for active sessions.

When does prompt caching NOT help?+

Caching helps only when the same prefix is reused. If every request has a unique system prompt, caching has no benefit. It also has minimal effect for short prompts under 512 tokens. Best use cases: RAG with a fixed document corpus, agents with a long static system prompt, and multi-turn conversations.

Related Calculators