50% off for async processing — OpenAI, Anthropic, and Google all offer it. Enter your token volume and see exactly how much you save by switching to batch mode.
Select provider · Enter token volume · See exact monthly savings vs real-time API
Standard vs Batch API pricing per 1M tokens. June 2026.
| Provider | Model | Standard Input | Batch Input | Standard Output | Batch Output |
|---|---|---|---|---|---|
| OpenAI | GPT-5.4 nanoCHEAPEST | $0.20 | $0.10 | $1.25 | $0.625 |
| OpenAI | GPT-5.4 mini | $0.75 | $0.375 | $4.50 | $2.25 |
| OpenAI | GPT-5.4 | $2.50 | $1.25 | $15.00 | $7.50 |
| OpenAI | GPT-5.5 | $5.00 | $2.50 | $30.00 | $15.00 |
| Anthropic | Claude Haiku 4.5 | $1.00 | $0.50 | $5.00 | $2.50 |
| Anthropic | Claude Sonnet 4.6 | $3.00 | $1.50 | $15.00 | $7.50 |
| Anthropic | Claude Opus 4.8 | $5.00 | $2.50 | $25.00 | $12.50 |
| Gemini 3.1 Flash-Lite | $0.25 | $0.125 | $1.50 | $0.75 | |
| Gemini 3.5 Flash | $1.50 | $0.75 | $9.00 | $4.50 |
Batch API uses the exact same model weights as the real-time API. You get identical output quality — the only trade-off is latency (up to 24 hours). For overnight jobs, annotation pipelines, and document processing, this is free money.
Workloads where 24h latency is acceptable and savings are significant.
| Use Case | Volume / month | Standard Cost | Batch Cost | Saves |
|---|---|---|---|---|
| Document summarization | 50K docs · 4K+600 tok | $285 | $142 | $143/mo |
| Dataset annotation | 500K items · 800+200 tok | $750 | $375 | $375/mo |
| Embedding generation | 10M texts · 500 tok | $100 | $50 | $50/mo |
| Report generation | 10K reports · 3K+1K tok | $225 | $112 | $113/mo |
Real-time chatbots, live user interactions, streaming responses, or any workload where the user is waiting. Batch API is for background processing only.
The Batch API lets you submit requests asynchronously (results returned within 24 hours) at 50% of the standard per-token price. OpenAI, Anthropic, and Google all offer this discount. No quality difference — same model, same outputs, half the price.
Latency. Batch jobs can take up to 24 hours to complete. For real-time user interactions, you cannot use Batch API. For background jobs where speed is not critical, the 50% discount is essentially free money.
OpenAI Batch API supports GPT-5.4, GPT-5.4 mini, GPT-5.4 nano, GPT-5.5, and embedding models. Reasoning models like o4 require synchronous streaming and are not available in batch mode.
Anthropic's Message Batches API accepts up to 10,000 requests per batch. Results are available within 24 hours. All Claude models (Haiku, Sonnet, Opus) support batches at 50% input and output token discount.