How to Install and Run ClickHouse

7 minutes

5612

Note: This guide is written using Ubuntu 22.04 LTS.

Description: Learn how to quickly set up and run the ClickHouse database with our simple step-by-step guide. We’ll walk you through the installation process, show you how to start, and share practical troubleshooting tips to keep everything running smoothly.

Jump to TL;DR

When you have big data to work with, whether you are creating an IT project from scratch or looking for an alternative to your current set-up, you will need to decide which database to choose. The popular choice at the moment is ClickHouse.

It’s a fast open-source management system for column-oriented databases that ensures online analytical processing and produces data reports in real-time using SQL queries, including complex queries.

So, if you’re looking for a powerful column-oriented database that can handle large volumes of data, consider using ClickHouse as your analytic DBMS of choice. In this ClickHouse tutorial, you’ll learn how to install the ClickHouse server and client on your machine.

Why Use ClickHouse on Amazon Web Services

While there are serverless solutions for using ClickHouse database, some prefer to deploy the tools on their own server. Not everyone is able or willing to purchase hardware to create and maintain their own server to use ClickHouse, so renting virtual servers is a popular solution these days. One of the largest providers of such services is Amazon Web Services (AWS).

For this process, you don’t need to have a lot of things at hand – you just need to create an account on AWS and set up the suitable machine.

Below, we will describe how to deploy ClickHouse on your AWS server.

How to Run and Install ClickHouse: A Step-by-Step Guide

To install ClickHouse server, you need to follow several steps. For convenience, we’ve divided instructions into four parts.

Part 1. Repository

  1. Connect to your Amazon EC2instance.
  2. Add the GPG repository key so that you can securely download the latest ClickHouse verified packages:
    sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754
  3. Add the ClickHouse repository:
    echo "deb https://packages.clickhouse.com/deb stable main" | sudo tee /etc/apt/sources.list.d/clickhouse.list
  4. Update the packages information:
    sudo apt update

Part 2. ClickHouse Server Installation

  1. Install the ClickHouse server and ClickHouse client with the following command:
    sudo apt-get install clickhouse-server clickhouse-client
  2. During the installation process, enter the password for the default user (optional step):
  3. After successful installation, start the ClickHouse server first and then the ClickHouse client, according to the advice in the output of the previous command:
    sudo clickhouse startclickhouse-client --password
    Note: --password flag should not be entered if you did not set a password in step 2.2

Part 3. Using Clickhouse Server

  1. Next, we can look at the list of databases, create a new one, select it, create a table, fill it in, and see the results:
    • SHOW DATABASES;
    • CREATE DATABASE test;
    • USE test;
    • CREATE TABLE test_table (int_column Int64, string_column String) ENGINE = MergeTree() ORDER BY int_column;
    • INSERT INTO test_table VALUES (1, 'test1'), (2, 'test2'), (3, 'test3');
    • SELECT * FROM test_table;

Part 4. Connections

  1. If you want to allow connecting to the ClickHouse server externally, uncomment the following line in the configuration file /etc/clickhouse-server/config.xml (open it using sudo):
    <!-- <listen_host>0.0.0.0</listen_host> -->
    (or this line <!-- <listen_host>::</listen_host> -->)
  2. After that, you will need to restart the ClickHouse server with the following command:
    sudo clickhouse restart
  3. The ClickHouse server listens on port 8123 for HTTP connections and port 9000 for connections using clickhouse-client. Let’s allow access to both ports for a specific IP or all at once (for example, by specifying 0.0.0.0/32 or any):
    sudo ufw allow from <ip>/32 to any port 8123
    sudo ufw allow from <ip>/32 to any port 9000
  4. You also need to make sure that your machine on AWS has these ports open:
  5. Finally, you can connect to your server externally using clickhouse-client:
  6. Or alternatively, using DBeaver:

Congratulations! Now you have successfully installed a ClickHouse server that you can confidently use.

Beyond Installation: ClickHouse Ecosystem

Once you’ve installed and started running ClickHouse, it’s worth knowing about the tools and services that extend its capabilities. Here are two you may find especially useful:

ClickHouse Python 

Use Python libraries like clickhouse-connect to run queries, insert data, and integrate ClickHouse into data pipelines or machine learning workflows.

ClickHouse Cloud 

A fully managed, cloud-hosted version of ClickHouse with automatic scaling, high availability, and no server management. It lets you use the speed of ClickHouse without handling servers, setup, or maintenance.

Troubleshooting and Other Considerations

Here are some major considerations:

Connection Issues

If you see a message about a broken connection, simply repeat the query. If that doesn’t help, check the ClickHouse server for logging errors. If you start the ClickHouse client with the logging trace parameter, ClickHouse returns the stack trace with an error description. Other self-managed ClickHouse troubleshooting instructions are available here. If you don’t have a connection to a running ClickHouse service, you can use clickhouse-local, which opens up the ClickHouse features and functions.

How to Overcome a Small Amount of Ram Challenge?

To run ClickHouse on a smaller amount of RAM, manage the amount of data processed in queries. The size of temporary data can be estimated based on the operations you use (GROUP BY, DISTINCT, JOIN, etc.), which then allows you to calculate the required RAM.

Keep in mind that the minimum configurations were set during the installation process, so to use the ClickHouse server in production, you will most likely have to carry out additional configurations and use another processing configuration file (i.e., install ClickHouse with a different config file that in the example), taking into account the characteristics of your activity and your requirements for web analytics.

Takeaway

Hopefully, this ClickHouse tutorial will make it easier for you to get started with ClickHouse. We recommend following each step precisely to avoid potential issues when launching and operating ClickHouse.

TL;DR

ClickHouse is a fast, open-source column-oriented database ideal for real-time analytics on large datasets. This guide covers installing the server and client, running queries, connecting externally, and troubleshooting common issues. You can also extend its capabilities with ClickHouse Python or use ClickHouse Cloud for a fully managed solution.

FAQ

What is ClickHouse?

ClickHouse is an open-source, column-oriented database management system (DBMS) designed for fast analytical processing of large volumes of data. It’s widely used for business intelligence, real-time analytics, event data processing, and log analysis because it can handle billions of rows and return query results in milliseconds.

How does ClickHouse work — explanation

ClickHouse works by storing data in columns instead of rows, which makes it extremely efficient for analytical queries that scan, filter, and aggregate large datasets. Because of this design, it can process billions of records quickly, making it ideal for tasks like dashboards, log analysis, and real-time reporting.

What is ClickHouse Python?

ClickHouse Python refers to the Python libraries, such as clickhouse-connect, that let you connect Python applications to a ClickHouse database. You can run queries, insert data, and integrate ClickHouse into data pipelines or analytics workflows directly from Python.

What is ClickHouse Cloud?

ClickHouse Cloud is the fully managed, cloud-hosted version of ClickHouse. It provides automatic scaling, high availability, and no server maintenance, letting you focus on analytics while benefiting from ClickHouse’s speed and efficiency.

What are the main use cases for ClickHouse?

ClickHouse is ideal for real-time analytics, business intelligence, log and event data analysis, and any scenario that requires fast queries on massive datasets.