OpenAI Batch API Pricing 2026
The Batch API applies a flat 50% discount to all supported models. Input and output token prices are both halved — there's no catch, no tier minimum, and no degraded model quality.
| Model | Standard Input | Standard Output | Batch Input | Batch Output | Savings |
|---|---|---|---|---|---|
| GPT-4o | $2.50 / 1M | $10.00 / 1M | $1.25 / 1M | $5.00 / 1M | 50% |
| GPT-4o mini | $0.15 / 1M | $0.60 / 1M | $0.075 / 1M | $0.30 / 1M | 50% |
| o3 | $10.00 / 1M | $40.00 / 1M | $5.00 / 1M | $20.00 / 1M | 50% |
| o4-mini | $1.10 / 1M | $4.40 / 1M | $0.55 / 1M | $2.20 / 1M | 50% |
| GPT-4.1 | $2.00 / 1M | $8.00 / 1M | $1.00 / 1M | $4.00 / 1M | 50% |
GPT-4o mini via Batch API ($0.075 input / $0.30 output per 1M tokens) is currently the cheapest way to run a capable OpenAI model — cheaper than any other tier, including free-tier rate limits on cheaper models.
🧮 Calculate your Batch API savings
Enter your model, token counts, and monthly request volume to see your exact standard vs. batch cost side by side.
Open Batch API Calculator →How the OpenAI Batch API Works
The Batch API processes requests asynchronously with up to a 24-hour completion window. You submit a JSONL file containing multiple requests, OpenAI processes them in bulk during off-peak periods, and you retrieve the results when ready.
The three-step workflow:
- Upload: Create a JSONL file where each line is a complete API request, then upload it via the Files API
- Submit: Create a batch job referencing the uploaded file
- Retrieve: Poll the batch status and download the output file when complete
Each line in your JSONL file must include a custom_id for matching requests to responses:
Best Workloads for Batch API
The Batch API is ideal for any workload where a real-time response is not required. If your users aren't waiting for the result in the next few seconds, batch it.
High-value use cases
- Document processing: Contracts, invoices, reports, PDFs — extract, classify, summarize at scale
- Content generation: Product descriptions, SEO meta tags, email copy, social posts — generate overnight
- Data extraction: Structured entity extraction from unstructured text across large datasets
- Content moderation: Classify user-generated content for policy violations, sentiment, or spam
- Dataset labeling: Label training data for fine-tuning or evaluation
- Embedding generation: Generate embeddings for vector database ingestion in bulk
- Analytics and reporting: Nightly summarization jobs, weekly digest generation
- Translation: Translate large volumes of content across multiple languages
Real-time chat, voice assistants, live search features, or any user-facing interaction where latency matters. For those, use the standard API — the extra cost is the price of synchronous response.
Real Cost Savings Examples
Example 1 — Legal-tech document processor
Processing 5,000 contracts/month with GPT-4o. Average: 2,000 input tokens + 800 output tokens per contract.
- Standard API: (10M input × $2.50 + 4M output × $10.00) / 1M = $65/month
- Batch API: (10M × $1.25 + 4M × $5.00) / 1M = $32.50/month
- SAVES $32.50/month — $390/year
Example 2 — E-commerce product description generator
Generating descriptions for 50,000 products using GPT-4o mini. Average: 500 input + 300 output tokens.
- Standard API: (25M × $0.15 + 15M × $0.60) / 1M = $12.75
- Batch API: (25M × $0.075 + 15M × $0.30) / 1M = $6.38
- SAVES $6.37 per 50K run
Example 3 — AI startup, nightly analytics
Running 100,000 nightly classification requests with o4-mini. Average: 400 input + 100 output tokens.
- Standard API: (40M × $1.10 + 10M × $4.40) / 1M = $88/month
- Batch API: (40M × $0.55 + 10M × $2.20) / 1M = $44/month
- SAVES $44/month — $528/year
🧮 Calculate your exact savings
Plug in your model, token counts, and monthly volume to see standard vs. batch cost instantly.
Open Batch API Calculator →Anthropic Batch API: Same 50% Discount for Claude
Anthropic offers an equivalent feature called the Message Batches API, also with a 50% discount on all Claude models.
| Model | Standard Input | Standard Output | Batch Input | Batch Output |
|---|---|---|---|---|
| Claude Sonnet 4.6 | $3.00 / 1M | $15.00 / 1M | $1.50 / 1M | $7.50 / 1M |
| Claude Haiku 4.5 | $0.80 / 1M | $4.00 / 1M | $0.40 / 1M | $2.00 / 1M |
| Claude Opus 4.8 | $15.00 / 1M | $75.00 / 1M | $7.50 / 1M | $37.50 / 1M |
Anthropic's batch API also has a 24-hour completion window. For teams already using Claude, switching eligible workloads to batch is an identical 50% saving with no quality difference.
Batch API vs. Prompt Caching: Which Saves More?
These are complementary — not competing — strategies. The choice depends on your workload type:
- Use Batch API when you have a large volume of requests that don't need real-time responses. The 50% discount applies to the entire request.
- Use Prompt Caching when you have a large, repeated system prompt or static context (RAG documents, instructions) sent on every request. Cache hits on repeated context cost 10% of normal price.
- Use both together for maximum savings: a long system prompt with caching enabled, processed via Batch API. You get 50% off total cost plus 90% off repeated context tokens.
Batch API (50% off) + Prompt Caching (90% off repeated context) + GPT-4o mini (already cheapest) = the lowest possible per-request cost on OpenAI's platform. For async classification workloads with a large system prompt, this combination can cut costs by 80–95% vs. naive GPT-4o standard API usage.
Batch API Limits and Constraints
- Completion window: 24 hours maximum. Most jobs complete in 1–4 hours during off-peak times.
- Request limit per batch: 50,000 requests per batch file
- File size limit: 200 MB per batch file
- Token limit: Enqueued tokens vary by tier — check your usage limits in the OpenAI dashboard
- Supported endpoints: /v1/chat/completions, /v1/embeddings, /v1/completions (legacy)
- Expiry: Batch jobs expire after 24 hours if not completed; you're not charged for expired batches
Split very large workloads into multiple batch jobs of 10,000–25,000 requests each. Smaller batches complete faster and are easier to retry if something goes wrong.
Frequently Asked Questions
Does the Batch API produce lower-quality output?
No. The Batch API uses identical models with identical parameters. The only difference is asynchronous delivery. Quality, token counts, and supported features are identical to the standard API.
How do I know when my batch is done?
Poll the batch status endpoint or set up a webhook. The batch object has a status field that moves from validating → in_progress → completed.
Can I cancel a batch?
Yes — call openai.batches.cancel(batchId). You're only charged for requests that were already processed before cancellation.
Is there a free tier for the Batch API?
No separate free tier — Batch API usage counts against your standard usage limits and billing. The discount is built into the pricing, not a separate free quota.
🔤 Also: LLM Cost Calculator
Compare GPT-4o, Claude, Gemini and more — standard vs. batch pricing for any token volume.
Compare All Models →We build free, privacy-first cost calculators for developers and AI engineers. Pricing data is sourced directly from official provider documentation and verified monthly.
Last updated: July 5, 2026. Suggest a correction →