Two open-source giants with very different philosophies: Qdrant optimises for raw throughput in Rust, Weaviate optimises for AI-native expressiveness in GraphQL. Here is how the numbers and architecture actually compare.
All prices June 2026. Qdrant Cloud managed tiers vs Weaviate Cloud tiers vs self-hosted estimates.
| Metric | Qdrant Cloud | Qdrant Self-Hosted | Weaviate Cloud | Weaviate Self-Hosted |
|---|---|---|---|---|
| Free tier | 1 cluster · 1GB RAM · 0.5 CPU | Unlimited (infra cost) | Sandbox · ~1K objects | Unlimited (infra cost) |
| Starter | $9/mo (1GB RAM, ~100K vecs) | ~$7/mo (VPS) | $25/mo (~500K vecs) | ~$9/mo (1GB RAM VPS) |
| Standard / Performance | $65–220/mo | ~$35–72/mo | $150/mo (~5M vecs) | ~$65/mo |
| Scale tier (100M vecs) | ~$480/mo | ~$280/moCHEAPEST | ~$450/mo | ~$320/mo |
| Per-operation fees | None | None | None | None |
| Open source | ✅ Apache 2.0 | ✅ Apache 2.0 | ✅ BSD-3-Clause | ✅ BSD-3-Clause |
| gRPC API | ✅ | ✅ | ❌ | ❌ |
| GraphQL API | ❌ | ❌ | ✅ | ✅ |
| Built-in LLM modules | ❌ | ❌ | ✅ | ✅ |
| Sparse vector support | ✅ (v1.7+) | ✅ | ✅ (BM25 hybrid) | ✅ (BM25 hybrid) |
| Named / multi-vector | ✅ | ✅ | ✅ (named vectors) | ✅ |
| Multi-tenancy | Via collections | Via collections | ✅ Native (v1.20+) | ✅ Native |
1536-dim vectors (OpenAI text-embedding-3-small). Monthly totals including cluster, storage, and ops. June 2026 prices.
Unlike Pinecone's serverless model, neither Qdrant nor Weaviate charges per read, write, or storage unit. You pay a fixed cluster cost and get unlimited operations within that cluster's capacity. This makes budget forecasting straightforward — your bill does not spike during high-ingest periods or large batch re-embedding jobs.
Qdrant's query engine is written entirely in Rust with no garbage collection pauses and a memory-mapped segment architecture that keeps hot data in RAM while spilling cold data to NVMe with minimal latency impact. Published benchmarks from the ann-benchmarks suite (June 2025) show Qdrant achieving 8,400 queries per second at 99% recall on the 10M-vector deep-1B subset using a 32-core, 64GB RAM node — with P99 latency of 7.2ms at ef=128. Weaviate on the same hardware achieves approximately 4,100 QPS at the same recall target, with P99 latency around 13.8ms.
The gap narrows significantly at lower QPS. If your application never exceeds 200 concurrent queries per second, both databases have more than enough headroom on modest hardware, and the performance difference is imperceptible to end users. The Rust advantage becomes decisive only at sustained high-throughput workloads — recommendation systems, real-time document ranking, or anything generating tens of millions of queries per day.
Qdrant exposes both a REST API (port 6333) and a gRPC API (port 6334). For high-volume ingestion pipelines — loading 50M vectors from a batch job, for example — gRPC's binary protocol and HTTP/2 multiplexing reduce ingestion time by 30–50% compared to REST/JSON, according to Qdrant's own benchmarks. A 10M vector load that takes 4.5 hours over REST completes in under 3 hours over gRPC on the same hardware. Weaviate has no gRPC support as of June 2026; all communication is HTTP/1.1 or HTTP/2 REST and GraphQL. For teams with large initial dataset loads, this is a meaningful operational difference.
Weaviate's plugin architecture embeds embedding and inference directly into the database server. Configuring text2vec-openai at the class level means every object you insert is automatically vectorized server-side — you send raw text, Weaviate calls OpenAI's embedding API on your behalf, and stores the resulting vector. This eliminates an entire step from your ingestion pipeline. You never write embeddings = openai.embed(text) in your application code; Weaviate handles it. Changing your embedding model later requires only a schema update and a re-indexing trigger, not changes to all ingestion code paths.
The reranker-cohere and reranker-transformers modules extend this pattern to retrieval. A GraphQL query can specify rerank as a pipeline stage, and Weaviate sends retrieved candidates to a cross-encoder for score refinement entirely server-side. With Qdrant, re-ranking requires retrieving candidates in your application, calling the cross-encoder model yourself, and re-sorting the results before presenting them to the user — more control, but more code to maintain.
Both databases support pre-filtering (apply metadata filter before ANN search) and post-filtering (apply filter to ANN results). Qdrant's payload filter uses a typed DSL defined in the API schema: {"must": [{"key": "category", "match": {"value": "finance"}}]}. This is validated at the client-SDK level, so type errors are caught before the network call. Weaviate's GraphQL where filter is similarly typed and validated, but expressed in GraphQL syntax: where: {path: ["category"], operator: Equal, valueText: "finance"}. Both approaches work well; the preference is largely a matter of which query language your team is already comfortable with.
One noteworthy difference: Qdrant supports filtering on nested JSON payload keys, including arrays, with its nested filter condition. Weaviate supports filtering on properties of cross-referenced objects, which is a more relational model but requires explicitly defining cross-references in your schema. For documents with deeply nested metadata, Qdrant's flat-payload approach is more flexible without schema changes.
Weaviate's object store keeps a copy of the original object properties in an embedded key-value store alongside the HNSW graph. This means Weaviate's memory footprint scales with both vector dimensions and the size of stored object properties. Qdrant's payload store is optimised separately from the vector index — you can configure which payload fields are indexed for filtering and which are stored cold, giving fine-grained control over RAM usage. At 10M vectors with 1536 dimensions and modest metadata (~500 bytes per object), Weaviate typically requires about 15% more RAM than Qdrant for the same dataset, based on community benchmarks from the Qdrant and Weaviate Discord servers.
At the 100M vector scale, this difference compounds: Qdrant's mmap-backed segments can handle datasets larger than available RAM by paging cold segments to NVMe, accepting a latency penalty only for cold queries. Weaviate does support lazy loading of segments in v1.24+, but Qdrant's segment management tooling is more mature for datasets that exceed memory limits.
The teams behind each database have made a clear bet on their query interface. Qdrant's REST API is JSON-first and imperative — you describe what you want precisely, with no abstraction layer between your intent and the wire format. Engineers who prefer explicit control and minimal magic tend to find Qdrant's API satisfying. Weaviate's GraphQL interface is declarative and composable — you describe a retrieval pipeline as a nested query structure, and the server executes each stage in order. Engineers building complex RAG pipelines with multiple retrieval stages tend to find Weaviate's approach more readable once the initial learning curve is behind them.
Both provide first-class Python and TypeScript/JavaScript SDKs. Qdrant additionally provides a Rust SDK (as expected) and Go client. Weaviate's Python client has excellent async support and a fluent query builder API that partially abstracts the GraphQL syntax for engineers who prefer method chaining over raw GraphQL strings.
Plug in your vector count and query volume to see a precise monthly estimate for Qdrant, Weaviate, Pinecone, and Supabase pgvector side by side.
Open Vector DB Calculator →In controlled benchmarks on identical hardware, Qdrant's Rust-based HNSW engine consistently outperforms Weaviate's Java/Go hybrid on raw query throughput. Qdrant achieves sub-5ms P99 latency on 10M 1536-dim vectors with ef=128 on a 32GB RAM node. Weaviate on the same hardware typically lands at 8–14ms P99 due to JVM overhead in older builds and the additional object-store abstraction layer. Weaviate's performance has improved significantly in v1.24+ with improved segment management, so the gap is narrower for most workloads than raw benchmark suites suggest.
Weaviate modules (text2vec-openai, generative-openai, reranker-cohere, etc.) are plugins that wire directly into the server-side query pipeline, removing the need for client-side orchestration code. A full retrieve-rerank-generate RAG call is a single GraphQL request. With Qdrant, you build that orchestration yourself in application code — typically using a framework like LlamaIndex or LangChain. Qdrant's approach is more flexible (you control every step) while Weaviate's modules are more productive for standard RAG patterns where the module ecosystem covers your use case.
Yes, both provide official Docker images. Qdrant: docker run -p 6333:6333 qdrant/qdrant. Weaviate: use the official docker-compose.yml from weaviate.io which includes the modules you need (text2vec-transformers for local embeddings, for example). Both have in-memory modes suitable for CI test environments where you don't want data to persist between runs. Qdrant's Docker image is about 60MB; Weaviate's base image with a text2vec module is typically 3–5GB due to bundled model weights.
Qdrant self-hosted is the cheapest option at 100M vectors — a dedicated cluster of two high-memory nodes (128GB RAM total) runs approximately $280/month on commodity cloud hardware, fitting 100M 1536-dim vectors with mmap-backed segments. Weaviate self-hosted on equivalent hardware costs a similar $300–350/month due to slightly higher memory overhead per vector. On managed cloud, Qdrant Cloud's Scale tier runs ~$480/month vs Weaviate Cloud Enterprise at ~$450/month — effectively a tie at that scale. The biggest savings come from self-hosting either option rather than from choosing between them.