Compare GPT-4.1, Claude 4, Gemini 2.5, and 10+ models. Enter your token volume and request count — see exact monthly costs. Free, no signup, runs in your browser.
Select model · Enter token volume · Results update live
All prices in USD per 1 million tokens, pay-as-you-go, June 2026. Source: official provider pricing pages.
| Model | Input / 1M | Output / 1M | Context | Best for |
|---|---|---|---|---|
| Gemini 2.5 Flash-LiteCHEAPEST | $0.10 | $0.40 | 1M | Ultra-high volume |
| GPT-4o mini | $0.15 | $0.60 | 128K | High-volume, low-cost |
| Gemini 2.5 Flash | $0.30 | $2.50 | 1M | Balanced speed/cost |
| Claude Haiku 4.5 | $1.00 | $5.00 | 200K | Quality on a budget |
| GPT-4.1 | $2.00 | $8.00 | 1M | General flagship 2026 |
| o3 | $2.00 | $20.00 | 200K | Hard reasoning |
| GPT-4o | $2.50 | $10.00 | 128K | Multimodal tasks |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 1M | Coding & reasoning |
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M | Long-context tasks |
| Claude Opus 4.8 | $5.00 | $25.00 | 1M | Complex agentic tasks |
OpenAI, Anthropic, and Google all offer batch processing at 50% discount for async workloads (up to 24h latency). Zero quality difference.
Monthly cost for common production workloads.
| Workload | Volume | GPT-4o mini | GPT-4.1 | Claude Sonnet 4.6 |
|---|---|---|---|---|
| Customer support bot | 10K req/day · 1K+300 tok | $4.05 | $66 | $97.50 |
| Document summarizer | 1K docs/day · 4K+800 tok | $9.36 | $163 | $246 |
| Code review assistant | 500 req/day · 3K+1K tok | $9.45 | $156 | $247.50 |
| RAG Q&A system | 5K req/day · 2K+500 tok | $22.50 | $360 | $562.50 |
Ready to build? Start with free cloud credits:
GPT-4o costs $2.50 per million input tokens and $10.00 per million output tokens (pay-as-you-go, June 2026). The newer GPT-4.1 is cheaper at $2.00/$8.00 per 1M and often a better choice.
Gemini 2.5 Flash-Lite at $0.10/$0.40 per million tokens is the cheapest capable model. GPT-4o mini ($0.15/$0.60) is the cheapest OpenAI option. Both are strong for high-volume workloads.
Claude Sonnet 4.6 costs $3.00 per million input tokens and $15.00 per million output tokens via the Anthropic API. It offers 1M token context and excels at coding tasks.
Use cheaper models (GPT-4o mini vs GPT-4o is 16× cheaper). Enable Batch API for 50% off async workloads. Cache repeated prompts with prompt caching (Anthropic: 90% off cached tokens). Trim system prompts. Set max_tokens explicitly.
A token is roughly 4 characters or 0.75 words. A 1,000-word document is approximately 1,333 tokens. Most LLM APIs charge separately for input (prompt) and output (completion) tokens.