How much does the OpenAI Batch API cost?

The OpenAI Batch API costs exactly 50% of standard API pricing for all supported models. GPT-4o standard is $2.50 input / $10.00 output per 1M tokens. Batch API: $1.25 input / $5.00 output per 1M tokens. o3 standard is $10.00 / $40.00 — batch: $5.00 / $20.00.

What is the OpenAI Batch API?

The OpenAI Batch API processes large groups of requests asynchronously. You submit a JSONL file of requests, and OpenAI processes them within 24 hours at 50% of standard pricing. It supports the same models and parameters as the standard API — only the delivery mechanism differs.

Does the Batch API support GPT-4o mini?

Yes. GPT-4o mini standard is $0.15 input / $0.60 output per 1M tokens. Batch API: $0.075 input / $0.30 output — already the cheapest OpenAI option, and halved again.

What workloads are best for Batch API?

Any workload that doesn't need a real-time response: document processing, content moderation, data extraction, dataset labeling, overnight analytics, email personalization, SEO content generation, embedding generation, and report creation.

Anthropic also has a Batch API — how does it compare?

Yes. Anthropic's Message Batches API also offers 50% off on all Claude models. Claude Sonnet 4.6 standard is $3.00 / $15.00 per 1M tokens. Batch: $1.50 / $7.50. Both providers offer identical 50% discounts — choose based on your preferred model.

OpenAI Batch API Cost 2026: Save 50% on GPT-4o & o3 Bills

The OpenAI Batch API is the single easiest cost cut available to any developer using LLMs at scale — a flat 50% discount on every model, with identical output quality, requiring less than 30 lines of code to implement. If you're running any async workload and not using it, you're paying double.

OpenAI Batch API Pricing 2026

The Batch API applies a flat 50% discount to all supported models. Input and output token prices are both halved — there's no catch, no tier minimum, and no degraded model quality.

Model	Standard Input	Standard Output	Batch Input	Batch Output	Savings
GPT-4o	$2.50 / 1M	$10.00 / 1M	$1.25 / 1M	$5.00 / 1M	50%
GPT-4o mini	$0.15 / 1M	$0.60 / 1M	$0.075 / 1M	$0.30 / 1M	50%
o3	$10.00 / 1M	$40.00 / 1M	$5.00 / 1M	$20.00 / 1M	50%
o4-mini	$1.10 / 1M	$4.40 / 1M	$0.55 / 1M	$2.20 / 1M	50%
GPT-4.1	$2.00 / 1M	$8.00 / 1M	$1.00 / 1M	$4.00 / 1M	50%

Tip

GPT-4o mini via Batch API ($0.075 input / $0.30 output per 1M tokens) is currently the cheapest way to run a capable OpenAI model — cheaper than any other tier, including free-tier rate limits on cheaper models.

🧮 Calculate your Batch API savings

Enter your model, token counts, and monthly request volume to see your exact standard vs. batch cost side by side.

Open Batch API Calculator →

How the OpenAI Batch API Works

The Batch API processes requests asynchronously with up to a 24-hour completion window. You submit a JSONL file containing multiple requests, OpenAI processes them in bulk during off-peak periods, and you retrieve the results when ready.

The three-step workflow:

Upload: Create a JSONL file where each line is a complete API request, then upload it via the Files API
Submit: Create a batch job referencing the uploaded file
Retrieve: Poll the batch status and download the output file when complete

// Step 1 — upload your requests as a JSONL file const file = await openai.files.create({ file: fs.createReadStream('requests.jsonl'), purpose: 'batch', }); // Step 2 — create the batch job const batch = await openai.batches.create({ input_file_id: file.id, endpoint: '/v1/chat/completions', completion_window: '24h', }); // Step 3 — poll until complete, then retrieve const result = await openai.files.content(batch.output_file_id);

Each line in your JSONL file must include a custom_id for matching requests to responses:

// requests.jsonl — one JSON object per line {"custom_id":"req-001","method":"POST","url":"/v1/chat/completions","body":{"model":"gpt-4o","messages":[{"role":"user","content":"Summarize this contract: ..."}],"max_tokens":500}} {"custom_id":"req-002","method":"POST","url":"/v1/chat/completions","body":{"model":"gpt-4o","messages":[{"role":"user","content":"Extract key clauses from: ..."}],"max_tokens":300}}

Best Workloads for Batch API

The Batch API is ideal for any workload where a real-time response is not required. If your users aren't waiting for the result in the next few seconds, batch it.

High-value use cases

Document processing: Contracts, invoices, reports, PDFs — extract, classify, summarize at scale
Content generation: Product descriptions, SEO meta tags, email copy, social posts — generate overnight
Data extraction: Structured entity extraction from unstructured text across large datasets
Content moderation: Classify user-generated content for policy violations, sentiment, or spam
Dataset labeling: Label training data for fine-tuning or evaluation
Embedding generation: Generate embeddings for vector database ingestion in bulk
Analytics and reporting: Nightly summarization jobs, weekly digest generation
Translation: Translate large volumes of content across multiple languages

Not suitable for

Real-time chat, voice assistants, live search features, or any user-facing interaction where latency matters. For those, use the standard API — the extra cost is the price of synchronous response.

Real Cost Savings Examples

Example 1 — Legal-tech document processor

Processing 5,000 contracts/month with GPT-4o. Average: 2,000 input tokens + 800 output tokens per contract.

Standard API: (10M input × $2.50 + 4M output × $10.00) / 1M = $65/month
Batch API: (10M × $1.25 + 4M × $5.00) / 1M = $32.50/month
SAVES $32.50/month — $390/year

Example 2 — E-commerce product description generator

Generating descriptions for 50,000 products using GPT-4o mini. Average: 500 input + 300 output tokens.

Standard API: (25M × $0.15 + 15M × $0.60) / 1M = $12.75
Batch API: (25M × $0.075 + 15M × $0.30) / 1M = $6.38
SAVES $6.37 per 50K run

Example 3 — AI startup, nightly analytics

Running 100,000 nightly classification requests with o4-mini. Average: 400 input + 100 output tokens.

Standard API: (40M × $1.10 + 10M × $4.40) / 1M = $88/month
Batch API: (40M × $0.55 + 10M × $2.20) / 1M = $44/month
SAVES $44/month — $528/year

🧮 Calculate your exact savings

Plug in your model, token counts, and monthly volume to see standard vs. batch cost instantly.

Open Batch API Calculator →

Anthropic Batch API: Same 50% Discount for Claude

Anthropic offers an equivalent feature called the Message Batches API, also with a 50% discount on all Claude models.

Model	Standard Input	Standard Output	Batch Input	Batch Output
Claude Sonnet 4.6	$3.00 / 1M	$15.00 / 1M	$1.50 / 1M	$7.50 / 1M
Claude Haiku 4.5	$0.80 / 1M	$4.00 / 1M	$0.40 / 1M	$2.00 / 1M
Claude Opus 4.8	$15.00 / 1M	$75.00 / 1M	$7.50 / 1M	$37.50 / 1M

Anthropic's batch API also has a 24-hour completion window. For teams already using Claude, switching eligible workloads to batch is an identical 50% saving with no quality difference.

Batch API vs. Prompt Caching: Which Saves More?

These are complementary — not competing — strategies. The choice depends on your workload type:

Use Batch API when you have a large volume of requests that don't need real-time responses. The 50% discount applies to the entire request.
Use Prompt Caching when you have a large, repeated system prompt or static context (RAG documents, instructions) sent on every request. Cache hits on repeated context cost 10% of normal price.
Use both together for maximum savings: a long system prompt with caching enabled, processed via Batch API. You get 50% off total cost plus 90% off repeated context tokens.

Maximum savings combo

Batch API (50% off) + Prompt Caching (90% off repeated context) + GPT-4o mini (already cheapest) = the lowest possible per-request cost on OpenAI's platform. For async classification workloads with a large system prompt, this combination can cut costs by 80–95% vs. naive GPT-4o standard API usage.

Batch API Limits and Constraints

Completion window: 24 hours maximum. Most jobs complete in 1–4 hours during off-peak times.
Request limit per batch: 50,000 requests per batch file
File size limit: 200 MB per batch file
Token limit: Enqueued tokens vary by tier — check your usage limits in the OpenAI dashboard
Supported endpoints: /v1/chat/completions, /v1/embeddings, /v1/completions (legacy)
Expiry: Batch jobs expire after 24 hours if not completed; you're not charged for expired batches

Tip

Split very large workloads into multiple batch jobs of 10,000–25,000 requests each. Smaller batches complete faster and are easier to retry if something goes wrong.

Frequently Asked Questions

Does the Batch API produce lower-quality output?

No. The Batch API uses identical models with identical parameters. The only difference is asynchronous delivery. Quality, token counts, and supported features are identical to the standard API.

How do I know when my batch is done?

Poll the batch status endpoint or set up a webhook. The batch object has a status field that moves from validating → in_progress → completed.

Can I cancel a batch?

Yes — call openai.batches.cancel(batchId). You're only charged for requests that were already processed before cancellation.

Is there a free tier for the Batch API?

No separate free tier — Batch API usage counts against your standard usage limits and billing. The discount is built into the pricing, not a separate free quota.

🔤 Also: LLM Cost Calculator

Compare GPT-4o, Claude, Gemini and more — standard vs. batch pricing for any token volume.

Compare All Models →

🧮

APICalculators Team

We build free, privacy-first cost calculators for developers and AI engineers. Pricing data is sourced directly from official provider documentation and verified monthly.

Twitter →

Last updated: July 5, 2026. Suggest a correction →

OpenAI Batch API Cost 2026: Save 50% on GPT-4o, o3 & o4-mini Bills

OpenAI Batch API Pricing 2026

🧮 Calculate your Batch API savings

How the OpenAI Batch API Works

Best Workloads for Batch API

High-value use cases

Real Cost Savings Examples

Example 1 — Legal-tech document processor

Example 2 — E-commerce product description generator

Example 3 — AI startup, nightly analytics

🧮 Calculate your exact savings

Anthropic Batch API: Same 50% Discount for Claude

Batch API vs. Prompt Caching: Which Saves More?

Batch API Limits and Constraints

Frequently Asked Questions

Does the Batch API produce lower-quality output?

How do I know when my batch is done?

Can I cancel a batch?

Is there a free tier for the Batch API?

🔤 Also: LLM Cost Calculator