ClickHouse in Data Consulting Practice: How and Where We Use This DBMS
5 minutes
406
ClickHouse is a columnar open-source DBMS designed for fast online analytical query processing. It excels at handling big data, processing millions of rows per second, and supports horizontal scaling. It is an ideal choice for scenarios where data is already abundant and continuously growing, including:
- Analytics platforms – systems for gathering and analyzing business metrics, reporting, and dashboards.
- Monitoring systems – logs, security events, and server metrics.
- Advertising and marketing platforms – tracking user behavior and campaign performance.
- Financial services – transaction analysis, anomaly detection, and forecasting.
We often use ClickHouse in our data consulting projects, and in this article, I will explain its main features, advantages, and disadvantages, while providing a real-world example from our practice.
ClickHouse: A Quick Overview
What drives ClickHouse performance and speed? These properties are embedded in its architecture, and you can learn more in the official documentation.
The key feature of ClickHouse is that it is a columnar DBMS. This format allows faster data retrieval and processing compared to row-based systems. While a row-based DBMS processes each row, a columnar DBMS directly accesses the relevant columns, eliminating unnecessary operations. This makes ClickHouse not only faster but also reduces storage load by avoiding “redundant” operations.
ClickHouse offers several features that enable real-time analytics and data stream analysis, such as:
- Materialized views – creating physical tables that store the results of SQL queries.
- Massively Parallel Processing (MPP) – parallel execution that optimizes computational resources.
- MergeTree engine and its derivatives – enabling incremental data writes, background merging, partitioning, replication, and sampling.
Thanks to efficient compression algorithms, ClickHouse significantly reduces storage requirements.
Despite all these advantages, ClickHouse has its drawbacks:
- Limited transaction support and challenges with ACID compliance.
- Challenges with point operations on individual rows or columns due to its focus on large data volumes.
ClickHouse in the Cloud or On-Premise
ClickHouse can be deployed both locally and in the cloud. The best choice depends on the specifics of the project and the client’s capabilities. While this topic could fill a separate article, here are the key points:
- Deployment and Scaling:
In the cloud, the provider handles deployment, configuration, and scaling. They install the system, perform updates, allocate additional resources, and add new nodes. If ClickHouse is installed locally, the project team is responsible for deployment, including purchasing hardware for expansion.
- Performance:
In the cloud, performance depends on the provider’s infrastructure and network architecture. With on-premise deployment, performance depends entirely on the client’s infrastructure, allowing complete control over hardware (e.g., CPU, RAM, disks).
- Security:
In the cloud, security is managed by the provider and typically includes encryption, access management, and attack protection. However, in some industries (finance, healthcare), public cloud usage is restricted by regulations. On-premise offers complete control over data security.
- Cost:
Typically, cloud deployment follows a subscription-based pricing model, freeing users from purchasing and configuring hardware. For on-premise deployment, you either need to rent a virtual machine or invest in hardware.
Conclusion: Cloud is better for fast start-ups and minimizing installation and administration complexities. On-premise is suited for clients who require full control over infrastructure and data.
ClickHouse in Our Practice: Fast Processing of Constantly Changing Data
One of our best ClickHouse use cases involves betPawa, a betting service that needed to handle massive, continuously updated datasets. This case perfectly demonstrates why ClickHouse is the right choice: huge amounts of constantly changing data.
When the client approached us, their analytics system was very slow. Airflow was collecting data into MySQL, but the ETL processing could take the entire night or longer. As a result, they couldn’t access the previous day’s data in the morning. The system dealt with large amounts of constantly changing data, such as player actions (registrations, bets), payments, and transfers, all requiring real-time updates.
We needed to propose a new infrastructure that was more efficient and scalable, capable of handling growing data volumes.
We worked with this client in an outstaffing model — one (and later two) of our engineers worked full-time on the project. This was necessary due to the large number of tasks and the need for constant infrastructure updates.
The Solution:
- All data is ingested into ClickHouse directly from MySQL or through Kafka topics. The data is sent to Kafka from MySQL via MaxWell’s Daemon or directly from other sources.
- Most data goes through a Stage layer, where data from multiple databases is merged and transformed before being loaded into fact tables and analytics dashboards. Some data skips the Stage layer and directly enters the fact tables.
- Some tables show data with a 1-day or several-hour delay, but many now lag only 3-5 minutes.

Thus, we transitioned from processes that could take an entire night to real-time analytics with minimal delays.
If you want to implement ClickHouse in your project or need help setting up real-time data processing, leave a request—we’ll get back to you shortly!