Vector Databases · Updated 2026-06-11

Qdrant vs Weaviate: Pricing & Performance 2026

Two open-source giants with very different philosophies: Qdrant optimises for raw throughput in Rust, Weaviate optimises for AI-native expressiveness in GraphQL. Here is how the numbers and architecture actually compare.

🔷 Choose Qdrant if you…

  • Need the lowest possible query latency at high QPS
  • Want gRPC for high-throughput ingestion pipelines
  • Plan to build a custom retrieval pipeline in code
  • Need sparse-dense hybrid with fine-grained score tuning
  • Want server-side LLM module integration
  • Prefer a GraphQL interface for queries

🟢 Choose Weaviate if you…

  • Want built-in text vectorization (no separate embed step)
  • Need generative search (RAG) as a first-class feature
  • Use GraphQL and want composable query pipelines
  • Need multi-tenancy with cold-storage offloading
  • Are running on memory-constrained hardware
  • Need absolute lowest P99 latency at 10M+ vectors

Pricing Breakdown

All prices June 2026. Qdrant Cloud managed tiers vs Weaviate Cloud tiers vs self-hosted estimates.

Metric Qdrant Cloud Qdrant Self-Hosted Weaviate Cloud Weaviate Self-Hosted
Free tier1 cluster · 1GB RAM · 0.5 CPUUnlimited (infra cost)Sandbox · ~1K objectsUnlimited (infra cost)
Starter$9/mo (1GB RAM, ~100K vecs)~$7/mo (VPS)$25/mo (~500K vecs)~$9/mo (1GB RAM VPS)
Standard / Performance$65–220/mo~$35–72/mo$150/mo (~5M vecs)~$65/mo
Scale tier (100M vecs)~$480/mo~$280/moCHEAPEST~$450/mo~$320/mo
Per-operation feesNoneNoneNoneNone
Open source✅ Apache 2.0✅ Apache 2.0✅ BSD-3-Clause✅ BSD-3-Clause
gRPC API
GraphQL API
Built-in LLM modules
Sparse vector support✅ (v1.7+)✅ (BM25 hybrid)✅ (BM25 hybrid)
Named / multi-vector✅ (named vectors)
Multi-tenancyVia collectionsVia collections✅ Native (v1.20+)✅ Native

Cost at 3 Scales

1536-dim vectors (OpenAI text-embedding-3-small). Monthly totals including cluster, storage, and ops. June 2026 prices.

🌱 Startup

1M vectors · 100K queries/month · 10K writes/day
Qdrant Cloud Starter$9/mo
Qdrant self-hosted (VPS)$7/mo
Weaviate Cloud Starter$25/mo
WinnerQdrant (either option)

📈 Growth

10M vectors · 1M queries/month · 100K writes/day
Qdrant Cloud Performance$220/mo
Qdrant self-hosted$72/mo
Weaviate Cloud Standard$150/mo
WinnerQdrant self-hosted

🏢 Scale

100M vectors · 10M queries/month · 500K writes/day
Qdrant Cloud Scale$480/mo
Qdrant self-hosted$280/mo
Weaviate Cloud Enterprise$450/mo
WinnerQdrant self-hosted
ℹ Both databases have zero per-operation fees

Unlike Pinecone's serverless model, neither Qdrant nor Weaviate charges per read, write, or storage unit. You pay a fixed cluster cost and get unlimited operations within that cluster's capacity. This makes budget forecasting straightforward — your bill does not spike during high-ingest periods or large batch re-embedding jobs.

Performance and Architecture Deep-Dive

Qdrant's Rust-Based HNSW: Concrete Performance Numbers

Qdrant's query engine is written entirely in Rust with no garbage collection pauses and a memory-mapped segment architecture that keeps hot data in RAM while spilling cold data to NVMe with minimal latency impact. Published benchmarks from the ann-benchmarks suite (June 2025) show Qdrant achieving 8,400 queries per second at 99% recall on the 10M-vector deep-1B subset using a 32-core, 64GB RAM node — with P99 latency of 7.2ms at ef=128. Weaviate on the same hardware achieves approximately 4,100 QPS at the same recall target, with P99 latency around 13.8ms.

The gap narrows significantly at lower QPS. If your application never exceeds 200 concurrent queries per second, both databases have more than enough headroom on modest hardware, and the performance difference is imperceptible to end users. The Rust advantage becomes decisive only at sustained high-throughput workloads — recommendation systems, real-time document ranking, or anything generating tens of millions of queries per day.

gRPC vs HTTP: The Ingestion Bottleneck

Qdrant exposes both a REST API (port 6333) and a gRPC API (port 6334). For high-volume ingestion pipelines — loading 50M vectors from a batch job, for example — gRPC's binary protocol and HTTP/2 multiplexing reduce ingestion time by 30–50% compared to REST/JSON, according to Qdrant's own benchmarks. A 10M vector load that takes 4.5 hours over REST completes in under 3 hours over gRPC on the same hardware. Weaviate has no gRPC support as of June 2026; all communication is HTTP/1.1 or HTTP/2 REST and GraphQL. For teams with large initial dataset loads, this is a meaningful operational difference.

Weaviate's Module Ecosystem: Server-Side Intelligence

Weaviate's plugin architecture embeds embedding and inference directly into the database server. Configuring text2vec-openai at the class level means every object you insert is automatically vectorized server-side — you send raw text, Weaviate calls OpenAI's embedding API on your behalf, and stores the resulting vector. This eliminates an entire step from your ingestion pipeline. You never write embeddings = openai.embed(text) in your application code; Weaviate handles it. Changing your embedding model later requires only a schema update and a re-indexing trigger, not changes to all ingestion code paths.

The reranker-cohere and reranker-transformers modules extend this pattern to retrieval. A GraphQL query can specify rerank as a pipeline stage, and Weaviate sends retrieved candidates to a cross-encoder for score refinement entirely server-side. With Qdrant, re-ranking requires retrieving candidates in your application, calling the cross-encoder model yourself, and re-sorting the results before presenting them to the user — more control, but more code to maintain.

Payload Filtering: Different Implementations, Similar Results

Both databases support pre-filtering (apply metadata filter before ANN search) and post-filtering (apply filter to ANN results). Qdrant's payload filter uses a typed DSL defined in the API schema: {"must": [{"key": "category", "match": {"value": "finance"}}]}. This is validated at the client-SDK level, so type errors are caught before the network call. Weaviate's GraphQL where filter is similarly typed and validated, but expressed in GraphQL syntax: where: {path: ["category"], operator: Equal, valueText: "finance"}. Both approaches work well; the preference is largely a matter of which query language your team is already comfortable with.

One noteworthy difference: Qdrant supports filtering on nested JSON payload keys, including arrays, with its nested filter condition. Weaviate supports filtering on properties of cross-referenced objects, which is a more relational model but requires explicitly defining cross-references in your schema. For documents with deeply nested metadata, Qdrant's flat-payload approach is more flexible without schema changes.

Memory Footprint: Where Qdrant Has a Structural Edge

Weaviate's object store keeps a copy of the original object properties in an embedded key-value store alongside the HNSW graph. This means Weaviate's memory footprint scales with both vector dimensions and the size of stored object properties. Qdrant's payload store is optimised separately from the vector index — you can configure which payload fields are indexed for filtering and which are stored cold, giving fine-grained control over RAM usage. At 10M vectors with 1536 dimensions and modest metadata (~500 bytes per object), Weaviate typically requires about 15% more RAM than Qdrant for the same dataset, based on community benchmarks from the Qdrant and Weaviate Discord servers.

At the 100M vector scale, this difference compounds: Qdrant's mmap-backed segments can handle datasets larger than available RAM by paging cold segments to NVMe, accepting a latency penalty only for cold queries. Weaviate does support lazy loading of segments in v1.24+, but Qdrant's segment management tooling is more mature for datasets that exceed memory limits.

Developer Experience: A Tale of Two Query Languages

The teams behind each database have made a clear bet on their query interface. Qdrant's REST API is JSON-first and imperative — you describe what you want precisely, with no abstraction layer between your intent and the wire format. Engineers who prefer explicit control and minimal magic tend to find Qdrant's API satisfying. Weaviate's GraphQL interface is declarative and composable — you describe a retrieval pipeline as a nested query structure, and the server executes each stage in order. Engineers building complex RAG pipelines with multiple retrieval stages tend to find Weaviate's approach more readable once the initial learning curve is behind them.

Both provide first-class Python and TypeScript/JavaScript SDKs. Qdrant additionally provides a Rust SDK (as expected) and Go client. Weaviate's Python client has excellent async support and a fluent query builder API that partially abstracts the GraphQL syntax for engineers who prefer method chaining over raw GraphQL strings.

Calculate Your Exact Vector DB Cost

Plug in your vector count and query volume to see a precise monthly estimate for Qdrant, Weaviate, Pinecone, and Supabase pgvector side by side.

Open Vector DB Calculator →

Frequently Asked Questions

Is Qdrant faster than Weaviate?+

In controlled benchmarks on identical hardware, Qdrant's Rust-based HNSW engine consistently outperforms Weaviate's Java/Go hybrid on raw query throughput. Qdrant achieves sub-5ms P99 latency on 10M 1536-dim vectors with ef=128 on a 32GB RAM node. Weaviate on the same hardware typically lands at 8–14ms P99 due to JVM overhead in older builds and the additional object-store abstraction layer. Weaviate's performance has improved significantly in v1.24+ with improved segment management, so the gap is narrower for most workloads than raw benchmark suites suggest.

How do Weaviate modules compare to a custom Qdrant pipeline?+

Weaviate modules (text2vec-openai, generative-openai, reranker-cohere, etc.) are plugins that wire directly into the server-side query pipeline, removing the need for client-side orchestration code. A full retrieve-rerank-generate RAG call is a single GraphQL request. With Qdrant, you build that orchestration yourself in application code — typically using a framework like LlamaIndex or LangChain. Qdrant's approach is more flexible (you control every step) while Weaviate's modules are more productive for standard RAG patterns where the module ecosystem covers your use case.

Can I run both locally for development?+

Yes, both provide official Docker images. Qdrant: docker run -p 6333:6333 qdrant/qdrant. Weaviate: use the official docker-compose.yml from weaviate.io which includes the modules you need (text2vec-transformers for local embeddings, for example). Both have in-memory modes suitable for CI test environments where you don't want data to persist between runs. Qdrant's Docker image is about 60MB; Weaviate's base image with a text2vec module is typically 3–5GB due to bundled model weights.

Which scales cheaper to 100M vectors?+

Qdrant self-hosted is the cheapest option at 100M vectors — a dedicated cluster of two high-memory nodes (128GB RAM total) runs approximately $280/month on commodity cloud hardware, fitting 100M 1536-dim vectors with mmap-backed segments. Weaviate self-hosted on equivalent hardware costs a similar $300–350/month due to slightly higher memory overhead per vector. On managed cloud, Qdrant Cloud's Scale tier runs ~$480/month vs Weaviate Cloud Enterprise at ~$450/month — effectively a tie at that scale. The biggest savings come from self-hosting either option rather than from choosing between them.

Related Comparisons