Note: This guide is written using Ubuntu 22.04 LTS.
When you have big data to work with, whether you are creating an IT project from scratch or looking for an alternative to your current set-up, you will need to decide which database to choose. The popular choice at the moment is ClickHouse.
It’s a fast open-source management system for column-oriented databases that ensures online analytical processing and produces data reports in real-time using SQL queries, including complex queries.
So, if you’re looking for a powerful column-oriented database that can handle large volumes of data, consider using ClickHouse as your analytic DBMS of choice. In this tutorial, you’ll learn how to install the ClickHouse server and client on your machine.
While there are serverless solutions for using ClickHouse, some prefer to deploy the tools on their own server. Not everyone is able or willing to purchase hardware to create and maintain their own server to use ClickHouse, so renting virtual servers is a popular solution these days. One of the largest providers of such services is Amazon Web Services (AWS).
For this process, you don’t need to have a lot of things at hand – you just need to create an account on AWS and set up the suitable machine.
Below, we will describe how to deploy ClickHouse on your AWS server.
For convenience, we’ve divided instructions into four parts.
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754
echo "deb https://packages.clickhouse.com/deb stable main" | sudo tee /etc/apt/sources.list.d/clickhouse.list
sudo apt update
sudo apt-get install clickhouse-server clickhouse-client
sudo clickhouse startclickhouse-client --password
Note: --password
flag should not be entered if you did not set a password in step 2.2
SHOW DATABASES;
CREATE DATABASE test;
USE test;
CREATE TABLE test_table (int_column Int64, string_column String) ENGINE = MergeTree() ORDER BY int_column;
INSERT INTO test_table VALUES (1, 'test1'), (2, 'test2'), (3, 'test3');
SELECT * FROM test_table;
/etc/clickhouse-server/config.xml
(open it using sudo
):
<!-- <listen_host>0.0.0.0</listen_host> -->
(or this line
<!-- <listen_host>::</listen_host> -->
)
sudo clickhouse restart
clickhouse-client
. Let’s allow access to both ports for a specific IP or all at once (for example, by specifying 0.0.0.0/32
or any
):
sudo ufw allow from <ip>/32 to any port 8123
sudo ufw allow from <ip>/32 to any port 9000
clickhouse-client
:Congratulations! Now you have successfully installed a ClickHouse server that you can confidently use.
Here are some major considerations:
If you see a message about a broken connection, simply repeat the query. If that doesn’t help, check the server for logging errors. If you start the client with the logging trace parameter, ClickHouse returns the stack trace with an error description. Other self-managed ClickHouse troubleshooting instructions are available here.
If you don’t have a connection to a running ClickHouse service, you can use clickhouse-local, which opens up the ClickHouse features and functions.
To run ClickHouse on a smaller amount of RAM, manage the amount of data processed in queries. The size of temporary data can be estimated based on the operations you use (GROUP BY, DISTINCT, JOIN, etc.), which then allows you to calculate the required RAM.
Keep in mind that the minimum configurations were set during the installation process, so to use the ClickHouse server in production, you will most likely have to carry out additional configurations and use another processing configuration file (i.e., install ClickHouse with a different config file that in the example), taking into account the characteristics of your activity and your requirements for web analytics.
Hopefully, this guide will make it easier for you to get started with ClickHouse. We recommend following each step precisely to avoid potential issues when launching and operating ClickHouse.
You need this guide, because it includes:
Turn it on to get exclusive guide on modern data stack
Emails suck. This newsletter doesn’t
Subscribe to the newsletter and get the most useful guide on modern data stack
The newsletter
You will also receive other useful materials on data analysis hacks with case examples from our company.