ClickHouse vs Druid vs Pinot 2026

Real-time OLAP is now its own product category. The big three — ClickHouse, Apache Druid, Apache Pinot — have spent the last several years converging on what user-facing analytics needs: sub-second queries, high-cardinality dimensions, streaming ingest, joins, and operational stories that engineering teams can actually run. The fourth credible answer, DuckDB at the edge, plays a different game entirely. The decision in 2026 is rarely about raw speed; it is about which operational model and which workload shape fits your team.

The four contenders#

ClickHouse is the column store that ate real-time analytics. ClickHouse Inc. (now ClickHouse Cloud) is the main vendor, with Altinity as the long-standing community-friendly alternative for managed and supported self-hosted. Strong SQL surface, MergeTree variants for every workload shape, mature dbt integration, increasingly good join support.

Apache Druid is the original real-time OLAP database from Metamarkets/Imply. Imply Polaris is the managed cloud. Designed from day one for streaming ingest and time-series analytical queries. SQL via Druid SQL, plus the native JSON query language for advanced cases.

Apache Pinot is the LinkedIn-originated real-time OLAP database, now backed commercially by StarTree (StarTree Cloud). Closest to Druid architecturally but with better support for upserts, joins, and user-facing analytics at hyperscale. Used at LinkedIn, Uber, Stripe, and many others.

DuckDB is the in-process analytical database that you embed in your application. MotherDuck is the managed cloud play. Not a streaming engine, not a real-time ingest store — but a credible “BI on OLTP” alternative when your analytics fit in memory.

Architecture in one paragraph each#

ClickHouse stores data as MergeTree tables — sorted, partitioned columnar files that compact in the background. Distributed via shards plus replicas, coordinated by ClickHouse Keeper. Reads parallelize across shards; writes flow through Distributed engine to local MergeTree shards.

Druid splits ingestion (Indexers, MiddleManagers) from query (Brokers, Historicals) from coordination (Coordinator, Overlord). Data is segmented by time and stored on Historicals as immutable segments. Real-time and batch ingest paths converge through the Indexers. Lots of moving parts; well-documented.

Pinot is architecturally similar to Druid — Controllers, Brokers, Servers, Minions. Segments are time-partitioned and immutable. Real-time ingest from Kafka via the Stream Ingestion Pipeline. The headline differences from Druid: upsert tables (real CDC-style updates, not just appends), better join support (lookup, broadcast, distributed), and the StarTree index for high-cardinality multi-dimensional queries.

DuckDB is single-process. Embeds in your application (Python, Node, Rust, etc.) and reads directly from Parquet, CSV, JSON, or its own native format. No cluster, no coordination, no servers. The MotherDuck product extends this with cloud storage and shared databases.

Real-time OLAP architectures

Ingest rates and freshness#

All three of the “real” real-time engines (ClickHouse, Druid, Pinot) handle millions of events per second per cluster with sub-second visibility. Where they differ:

ClickHouse does best with bulk inserts and Kafka Engine for streaming. Native streaming ingest improved meaningfully in 2024-2025 with ClickHouse Cloud’s S3-based ingestion path. Sub-second visibility is real but takes tuning.
Druid is designed for streaming first. Kafka Indexing Service ingests directly from topics. Sub-second freshness is the default.
Pinot is similar to Druid for streaming ingest. The upsert support is the headline advantage when your stream has updates rather than append-only events.

For pure append-only event streams, all three work. For change-data-capture style streams with updates and deletes, Pinot is the cleanest answer.

Query latency and the tail#

For point queries on indexed dimensions over time-bounded ranges, all three are in the same neighborhood — single-digit to low double-digit milliseconds at moderate scale.

For complex aggregations over large data:

ClickHouse dominates on raw scan speed and SQL flexibility. Materialized views are the canonical pattern for fast user-facing aggregates.
Druid is fast at time-series filters and group-bys, slower at full-table aggregations.
Pinot is fast at all of the above, with the StarTree index closing the gap for high-cardinality group-bys.

For joins, the story matters more in 2026 than it used to:

ClickHouse has had real joins for a while. Performance is workload-dependent; large hash joins still want enough RAM.
Druid has lookup joins (small dimension tables) but is weakest of the three on large-fact-to-large-fact joins.
Pinot has the best join story of the three Apache projects — lookup, broadcast, and distributed joins all supported.

The tail latency story for user-facing analytics (P99, not just median) is where StarTree Cloud and ClickHouse Cloud both put real engineering effort. Self-hosted clusters can match these but it takes work.

Operational shape#

ClickHouse has the smallest operational footprint of the three Apache-derived options. ClickHouse Cloud collapses it further. Self-hosted with the Altinity Operator on Kubernetes is well-understood.

Druid has more moving parts. Imply Polaris removes most of that. Self-hosted Druid on Kubernetes via the Druid Operator works but the learning curve is real.

Pinot is similarly multi-process. StarTree Cloud is the managed answer. Self-hosted Pinot is doable but you should expect a real ramp.

DuckDB has no operational story by design. MotherDuck adds collaboration and cloud storage but the local-first model is the headline.

For teams that want the smallest ops surface, ClickHouse Cloud or StarTree Cloud are the right starting points. For teams that already operate Kafka and Kubernetes at scale, all three Apache-derived options are tractable.

BI on OLTP

The “BI on OLTP” alternative#

A pattern we have shipped several times in 2025-2026: skip the real-time OLAP database entirely and run analytical queries directly against the transactional store.

Postgres with a logical replica for analytical queries, plus aggressive indexing, plus materialized views.
TiDB or YugabyteDB with HTAP capabilities (TiFlash, Yugabyte’s columnar option).
DuckDB reading Parquet exports from the OLTP store on a 5-minute cadence.

This works when your data volume is modest (under a few hundred GB hot), your query patterns are predictable, and you do not want to operate a second database. It does not work when you need true sub-second user-facing analytics at high concurrency.

DuckDB is increasingly the answer for “analytics, but small.” MotherDuck makes the collaboration story real. We have shipped DuckDB-based analytics dashboards for clients whose entire dataset fit in a 10 GB Parquet file.

When to pick each#

ClickHouse when SQL flexibility and raw scan speed matter, your team has SQL skills, and ClickHouse Cloud’s managed offering fits your compliance posture.

Druid when streaming ingest is the headline, your queries are time-series-shaped, and Imply Polaris fits your model.

Pinot when you need upserts in a real-time OLAP store, joins are part of the workload, and user-facing analytics at very high concurrency is the goal. The LinkedIn/Uber/Stripe heritage is real.

DuckDB or BI-on-OLTP when your scale does not actually justify a separate real-time OLAP engine. This is the right answer more often than vendors will tell you.

The cost story#

ClickHouse Cloud and StarTree Cloud are roughly comparable for typical workloads, with Imply Polaris in the same neighborhood. Self-hosted is dramatically cheaper at scale, more expensive in engineering time.

DuckDB and MotherDuck are by far the cheapest for workloads that fit the model.

Where pdpspectra fits#

We have shipped ClickHouse for user-facing SaaS analytics, Pinot for a real-time logistics dashboard with upsert requirements, and DuckDB-on-Parquet for a client whose analytics did not need a server. Each was the right call for that workload.

If you are picking a real-time analytics engine, our data engineering team will benchmark the candidates against your real query patterns and pick the one we would ship. The answer depends on freshness requirements, query concurrency, join complexity, and your team’s operational shape.

Real-time OLAP is a real product category now. Picking inside it is mostly a workload conversation. Tell us about your queries and we will help you choose.

The four contenders#

Architecture in one paragraph each#

Ingest rates and freshness#

Query latency and the tail#

Operational shape#

The “BI on OLTP” alternative#

When to pick each#

The cost story#

Where pdpspectra fits#

Related reading#

Related posts.

The Hospital Management System as a Data Platform

Operational Automation That Ships: Boring Tools Win

An AI Agent Debugging Production Is a Retrieval Problem: What Elastic Buying DeductiveAI Tells You About AI SRE