Prompt Caching Cost Calculator 2026 — Anthropic, OpenAI, Google

Key Takeaways

Anthropic gives 90% off cached reads — the highest prompt caching discount of any major AI provider
OpenAI auto-caches at 50% off with no extra API calls — discount applied automatically on prompts over 1,024 tokens
Google Gemini caches at 75% off but requires a minimum 32,768 token context to enable caching
For a 10K-token system prompt with 10K monthly requests, caching saves ≈ $270/month on Claude Sonnet 4.6

💾 Prompt Caching Savings Calculator

Select provider · Enter prompt size and volume · See exact monthly savings

Provider

Model

System prompt / cached prefix (tokens)

Requests per day using same prefix

Cache hit rate (%)

Monthly savings

— % reduction in prompt costs

Without caching / month—

Cache write cost / month—

With caching / month—

Monthly savings—

Annual savings—

Prompt Caching Rates by Provider

How much cached tokens cost vs standard input tokens. June 2026.

Provider	Model	Standard Input	Cache Write	Cache Read	Discount
Anthropic	Claude Sonnet 4.6	$3.00/1M	$3.75/1M	$0.30/1M	90% OFF
Anthropic	Claude Haiku 4.5	$1.00/1M	$1.25/1M	$0.10/1M	90% OFF
Anthropic	Claude Opus 4.8	$5.00/1M	$6.25/1M	$0.50/1M	90% OFF
OpenAI	GPT-5.4	$2.50/1M	auto	$1.25/1M	50% OFF
OpenAI	GPT-5.4 mini	$0.75/1M	auto	$0.375/1M	50% OFF
OpenAI	GPT-5.4 nano	$0.20/1M	auto	$0.10/1M	50% OFF
Google	Gemini 3.5 Flash	$1.50/1M	$1.00/1M	$0.375/1M	75% OFF
Google	Gemini 3.1 Pro	$2.00/1M	$1.00/1M	$0.50/1M	75% OFF

💡 Anthropic has the best caching deal

Anthropic's 90% cache read discount is the most aggressive in the industry. For workloads with a large static system prompt (RAG context, agent instructions), switching from OpenAI to Anthropic just for caching can cut prompt costs by 3× vs OpenAI's 50% discount.

When Caching Saves the Most

Real-world scenarios and monthly savings estimates.

Use Case	Cached Tokens	Requests/day	Without Cache	With Cache (Sonnet)	Saves
RAG with large doc corpus	8,000	1,000	$720	$81	$639/mo
Agent with long instructions	3,000	500	$135	$18	$117/mo
Customer support bot	1,500	2,000	$270	$36	$234/mo
Code review tool	500	200	$9	$3	$6/mo

📌 Break-even point

On Anthropic, you pay 25% extra for cache writes but 90% less for reads. Break-even is at 1.4 reuses of the same prefix — any more than 2 requests with the same system prompt and caching saves money. On OpenAI, caching is automatic (no write cost) so it's always beneficial.

Frequently Asked Questions

What is prompt caching and how does it save money?+

Prompt caching lets you reuse the beginning of a prompt (system prompt, document context) across multiple requests. Instead of re-processing the same tokens every call, the provider caches the KV state. Anthropic charges 90% less for cached tokens ($0.30/1M vs $3.00/1M on Sonnet). OpenAI charges 50% less. Google charges 75% less for Gemini cached tokens.

How much does Anthropic prompt caching cost?+

Anthropic prompt caching: cache write costs 25% more than standard input ($3.75/1M on Sonnet), but cache reads cost 90% less ($0.30/1M on Sonnet). Break-even is typically 2 requests using the same prefix. For 10+ requests with the same system prompt, savings are 80–88%.

Does OpenAI support prompt caching?+

Yes. OpenAI automatically caches prompt prefixes longer than 1024 tokens at 50% discount — no code change needed. GPT-5.4 cached input drops from $2.50/1M to $1.25/1M. Cache TTL is a few minutes for active sessions.

When does prompt caching NOT help?+

Caching helps only when the same prefix is reused. If every request has a unique system prompt, caching has no benefit. It also has minimal effect for short prompts under 512 tokens. Best use cases: RAG with a fixed document corpus, agents with a long static system prompt, and multi-turn conversations.

Related Calculators

🤖

LLM Cost Calculator

Full token pricing for 15+ models

📦

Batch API Cost

50% off async processing savings

🔢

Embedding API Cost

OpenAI vs Cohere vs Voyage

⚡

AI Agent Cost

Multi-step pipeline pricing