We use ClickHouse on roughly a third of data engagements at Valiotti Data. This article is our decision framework: what makes ClickHouse the right answer, what makes it the wrong one, and what a real production deployment looks like.
In This Article
We use ClickHouse on roughly a third of the data engagements at Valiotti Data. Not because we recommend it by default — we don’t — but because it’s the right answer for a specific shape of problem that shows up often enough to be worth understanding well.
ClickHouse in one paragraph
The performance story is the columnar layout. A row-oriented database scans every column of every row that matches the predicate. ClickHouse stores each column in its own file, so a query that only touches three columns out of forty reads roughly 1/13 of the data the row-store would. Compression amplifies this — typical real-world tables compress 5–10× because per-column data has low entropy. The result is sub-second response on tables that would take minutes in PostgreSQL.
Beyond the storage layout, the engine offers:
- Materialized views: physical tables that store SQL query results and update automatically as new rows arrive.
- Massively parallel processing (MPP): queries fan out across cores and shards.
- The MergeTree family of engines: partitioning, replication, sampling, deduplication, and TTL-based retention.
- Aggressive compression: codecs tuned per column type (delta, double-delta, gorilla, zstd) that shrink data several-fold.
The trade-offs are real:
- Limited ACID transactions. Recent versions added experimental transactional support, but the design assumes append-mostly workloads. Point updates and deletes are expensive.
- Poor at row-level operations. Single-row lookups by primary key are slow compared to a row store. ClickHouse is built for aggregations over millions of rows, not for serving an OLTP API.
Cloud or on-premise
| Factor | Cloud-managed | Self-hosted |
|---|---|---|
| Deployment | Provider handles installation, replication, upgrades. | Your team manages the cluster, hardware, and coordination layer. |
| Performance | Bounded by the provider’s instance types and architecture. | Bounded by the hardware you buy. |
| Security | Provider-controlled encryption and access. Suitable for most workloads. | Full control of the data path. Often mandatory in regulated industries. |
| Cost | Subscription, no hardware capex. Predictable monthly bill. | Capex on hardware or rented VMs. Operational cost in engineering time. |
Cloud (ClickHouse Cloud, Altinity.Cloud, Yandex Cloud Managed ClickHouse, AWS-hosted via ClickHouse Cloud) is the default for projects that need a fast start and don’t have someone on staff who’s run ClickHouse in production before. Self-hosted is the default for projects that need full infrastructure control, have regulatory constraints on where the data lives, or have already invested in a team that can operate it.
Where we reach for ClickHouse: betPawa
The case that shows the engine at its best is betPawa, an African sports-betting platform. We’ve been on the data side of betPawa since 2022 and the ClickHouse stack we built there is still in production.
The problem. Their original analytics ran on MySQL, with Airflow scheduling the ETL jobs. As traffic grew the ETL windows kept stretching — a job that started overnight wouldn’t be done by the next morning. The product side needed yesterday’s player behaviour, payment trends, and game-level numbers at the start of the work day, and the analytics side couldn’t deliver them. Player actions, payments, and transfers were all high-velocity event streams that didn’t fit the batch-oriented MySQL pipeline.
The architecture. We rebuilt the pipeline around ClickHouse:

The flow goes:
- Sources. Operational MySQL (with a backup replica), Kafka topics from the betting and platform services, and three
agi_*databases for partner-specific data. Wazdan game-rate data and Google Ads spend come in as separate feeds via Airflow. - Ingestion. MaxWell’s Daemon streams MySQL binlog events into Kafka. ETL Schedulers pull from MySQL and Kafka into the ClickHouse cluster.
- Storage. Two ClickHouse nodes (
01,02) in a replicated configuration, coordinated by a ZooKeeper ensemble of three. - Transformation. A Stage layer consolidates the multi-source streams. Most data transforms within 3–5 minutes of arrival; a handful of slowly-changing tables run with a one-day delay because they don’t need fresher.
- Serving. Redash sits on top of the ClickHouse cluster for analyst access; a separate PostgreSQL instance serves the customer-facing portion of the BI layer.
The result. The team moved from overnight batch processing to near-real-time analytics. Dashboards refresh continuously throughout the day. The morning meetings that used to start with “we don’t have yesterday’s numbers yet” don’t anymore.
The delivery model. Continuous infrastructure work — ClickHouse upgrades, schema migrations, capacity planning, the occasional ZooKeeper hiccup — meant we placed a full-time data engineer on the engagement rather than running it as a project. That’s the right shape when the system is mission-critical and constantly evolving. For one-off migrations or proof-of-concepts we’d run it as a fixed-scope project instead.
When ClickHouse is the wrong answer
A few patterns we steer clients away from:
- OLTP traffic on an OLAP engine. Per-user-action updates, single-row reads, transactional consistency across multiple tables — these are PostgreSQL’s job. We’ve inherited at least two clusters where a team moved their application database to ClickHouse “for performance” and spent the next year migrating back.
- Self-hosting without operational depth. Replication, backups, failover, and ZooKeeper coordination need someone who has run them in production. If you don’t have that person, use a managed service. The cost difference disappears the first time you avoid a 4 a.m. failover.
- ACID-heavy workloads. If your domain genuinely requires “all-or-nothing” writes across multiple tables (financial postings, inventory reservations), this is the wrong engine. The append-mostly design isn’t a flaw; it’s just a different problem class.
For the workloads it does fit — high-volume, mostly-append, latency-sensitive analytics — ClickHouse is one of the cleanest tools available in 2026. The betPawa engagement is the case study we point at when prospective clients ask “why ClickHouse and not Snowflake here?” The answer is usually some combination of cost at volume, control over the operational layer, and the need for sub-second response on dashboards that thousands of internal users hit every hour.