LLM API Cost Calculator 2026 — GPT-5.5, Claude Opus 4.8, Gemini 3.5 Flash Pricing

Key Takeaways

GPT-5.5 costs $5.00 per million input tokens and $30.00 per million output tokens as of June 2026
A typical request (1,500 input + 500 output tokens) costs approximately $0.023 with GPT-5.5
GPT-5.4 nano is 97% cheaper at $0.20/1M input — best for high-volume classification and routing
Use Batch API for 50% off async workloads, or prompt caching to save 75–90% on repeated system prompts

🤖 LLM API Cost Calculator

Select model · Enter token volume · Results update live

Model

Input tokens / request

Output tokens / request

Requests / day

Period

Estimated cost

— per month

Input / output per 1M tokens—

Cost per request—

Input cost—

Output cost—

Total tokens—

LLM API Prices — June 2026 (JavaScript disabled)

Model	Provider	Input / 1M tokens	Output / 1M tokens
GPT-5.5	OpenAI	$5.00	$30.00
GPT-5.4	OpenAI	$2.50	$15.00
GPT-5.4 mini	OpenAI	$0.75	$4.50
GPT-5.4 nano	OpenAI	$0.20	$1.25
o4-mini	OpenAI	$0.55	$2.20
Claude Fable 5	Anthropic	$10.00	$50.00
Claude Opus 4.8	Anthropic	$5.00	$25.00
Claude Sonnet 4.6	Anthropic	$3.00	$15.00
Claude Haiku 4.5	Anthropic	$1.00	$5.00
Gemini 3.5 Flash	Google	$1.50	$9.00
Gemini 3.1 Pro	Google	$2.00	$12.00
Gemini 3.1 Flash-Lite	Google	$0.25	$1.50

Enable JavaScript for the interactive calculator with live cost estimates.

2026 LLM API Pricing Table

All prices in USD per 1 million tokens, pay-as-you-go, June 2026. Source: official provider pricing pages.

Model	Input / 1M	Output / 1M	Context	Best for
GPT-5.4 nanoCHEAPEST	$0.20	$1.25	1M	Ultra-high volume, lowest cost
Gemini 3.1 Flash-Lite	$0.25	$1.50	1M	High-volume budget workloads
o4-mini	$0.55	$2.20	200K	Budget reasoning tasks
GPT-5.4 mini	$0.75	$4.50	1M	Cost-efficient production
Claude Haiku 4.5	$1.00	$5.00	200K	Quality on a budget
Gemini 3.5 Flash	$1.50	$9.00	1M	Frontier speed, long context
Gemini 3.1 Pro	$2.00	$12.00	1M	Flagship reasoning
Claude Sonnet 4.6	$3.00	$15.00	1M	Coding & reasoning balanced
GPT-5.4	$2.50	$15.00	1M	Balanced flagship quality
Claude Opus 4.8	$5.00	$25.00	1M	Complex agentic tasks
GPT-5.5	$5.00	$30.00	1M	Flagship intelligence
Claude Fable 5	$10.00	$50.00	1M	Long-running agents

💡 50% off with Batch API

OpenAI, Anthropic, and Google all offer batch processing at 50% discount for async workloads (up to 24h latency). Zero quality difference.

Real-World Cost Examples

Monthly cost for common production workloads.

Workload	Volume / month	GPT-5.4 nano	GPT-5.4 mini	Claude Sonnet 4.6
Customer support bot	150K req · 800+200 tok	$61.50	$225	$810
Document summarizer	15K docs · 4K+600 tok	$23.25	$85.50	$315
Code review assistant	6K req · 3K+1K tok	$11.10	$40.50	$144
RAG Q&A system	60K req · 2K+400 tok	$54	$198	$720

Ready to build? Start with free cloud credits:

☁️ $300 Vultr credit

How to Use This Calculator

Estimate your exact LLM API bill in under 60 seconds — no spreadsheets required.

1
Select your modelThe dropdown is split into Current Models (active, pay-as-you-go) and Legacy (older generations). Start with the model you plan to deploy. If you're still evaluating, try GPT-5.4 nano for budget workloads and Claude Sonnet 4.6 or GPT-5.4 for quality-first production use.
2
Enter input tokens per requestInput tokens include your system prompt, conversation history, and the user's message. A 200-word system prompt plus a 50-word user question is roughly 330 tokens. A RAG application that retrieves five 600-word document chunks plus a user question sends approximately 4,500 input tokens per request. When in doubt, use the rule of thumb: 1 word ≈ 1.33 tokens.
3
Enter output tokens per requestOutput tokens are the model's response. A one-sentence answer is about 30 tokens; a code function, roughly 400 tokens; a detailed summary, 600–1,000 tokens. Output tokens cost 4–10× more per token than input tokens across all providers, so reducing output length has the highest ROI for cost optimization.
4
Set requests per dayEstimate realistically: a medium SaaS with 500 active users making 2 LLM calls each = 1,000 requests/day. A document pipeline processing 500 documents with 3 LLM passes each = 1,500 requests/day. For a new product, starting with 100–500 requests/day is typical in early beta.
5
Choose your period and read the breakdownSwitch between Day, Month, and Year. The result panel shows total cost, cost per request, input vs. output cost split, and total tokens consumed. Pay attention to the input/output split — if output dominates, consider shortening responses with max_tokens. If input dominates, look at prompt caching or trimming your system prompt.

📌 Real example

A customer support bot with a 500-token system prompt, 200-token average user message, and 300-token responses running 1,000 requests/day: on GPT-5.4 nano ($0.20 input / $1.25 output per 1M), that's $0.00014 per request × 30,000 requests/month = $4.20/month. The same workload on Claude Sonnet 4.6 would be $27/month — a 6.4× difference.

Pricing Methodology

How we source and maintain the pricing data in this calculator.

Source: All prices shown are official, published pay-as-you-go rates from OpenAI's API pricing page, Anthropic's pricing page, and Google AI's pricing documentation, as of June 2026. We do not estimate, interpolate, or infer rates — every figure has a direct source URL we verify before publishing.

Update cadence: We review all pricing monthly and update within 48 hours of any provider announcement. When a price change is confirmed, we update both the static HTML table and the /pricing-data.json file that powers the interactive calculator simultaneously, so both surfaces stay in sync.

What we include: Standard, on-demand (pay-as-you-go) rates only. Enterprise contracts, committed-use discounts, and academic grants are not included because they vary by negotiation and are not available to all developers.

What we exclude: Batch API discounts (50% off for async jobs) and prompt caching discounts (up to 90% off on Anthropic cached tokens) are excluded from the base calculation but documented in the tips and FAQ sections. We list context window sizes for reference because exceeding a model's context limit causes an error — not an additional charge.

Accuracy disclaimer: LLM API pricing changes frequently. Always verify current rates directly with the provider before signing contracts or publishing cost estimates to investors or customers.

Frequently Asked Questions

How much does GPT-5.5 cost per 1 million tokens?+

GPT-5.5 costs $5.00 per million input tokens and $30.00 per million output tokens (pay-as-you-go, June 2026). For budget workloads, GPT-5.4 nano at $0.20/$1.25 per 1M is OpenAI's cheapest option — 25× less on input.

What is the cheapest LLM API in 2026?+

GPT-5.4 nano at $0.20/$1.25 per million tokens is the cheapest capable OpenAI model in 2026. Gemini 3.1 Flash-Lite ($0.25/$1.50) is Google's budget option. Both handle high-volume tasks well.

How much does Claude Sonnet 4.6 cost?+

Claude Sonnet 4.6 costs $3.00 per million input tokens and $15.00 per million output tokens via the Anthropic API. It offers 1M token context and excels at coding and reasoning tasks.

How do I reduce LLM API costs?+

Use cheaper models (GPT-5.4 nano vs GPT-5.5 is 25× cheaper on input). Enable Batch API for 50% off async workloads. Cache repeated prompts (Anthropic: 90% off cached tokens). Trim system prompts. Set max_tokens explicitly to avoid unnecessarily long responses.

What is a token in LLM pricing?+

A token is roughly 4 characters or 0.75 words. A 1,000-word document is approximately 1,333 tokens. Most LLM APIs charge separately for input (prompt) and output (completion) tokens. Code and non-English text tend to use more tokens per word.

Does the Batch API really cut costs in half?+

Yes — OpenAI, Anthropic, and Google all offer 50% discounts for asynchronous batch processing (results returned within 24 hours). The trade-off is latency: batch jobs are not suitable for real-time user interactions. They are ideal for overnight document processing, dataset annotation, report generation, and embeddings computation where speed is not critical.

How does prompt caching reduce Anthropic API costs?+

Anthropic's prompt caching lets you mark the beginning of your prompt (system prompt, document context) as cacheable. On subsequent requests that reuse that exact prefix, cached input tokens cost 90% less. At Claude Sonnet 4.6's $3.00/1M input rate, cached tokens cost $0.30/1M. This is most valuable when your system prompt is 500+ tokens and consistent across requests — common in RAG and agent applications.

Why are output tokens more expensive than input tokens?+

Generating a token requires sequential forward passes through the model (each token depends on the previous one), which is computationally intensive and cannot be parallelized. Reading input tokens (the "prefill" phase) can be parallelized across the GPU, making it far cheaper per token. This is why output tokens are typically 4–10× the price of input tokens across all major providers.

Related Calculators

🗄️

Vector DB Cost

Pinecone vs Supabase vs Qdrant

🔢

Embedding API Cost

OpenAI vs Cohere vs Voyage

🤖

AI Agent Cost

Multi-step pipeline pricing

⚡

Serverless Cost

Lambda vs Vercel vs Cloudflare