Circular
Replication Cluster Topology in ClickHouse
Data distribution
refers to splitting the very large dataset into multiple shards (a
smaller portion of the dataset) which are stored on different servers. Each
shard holds and processes a part of the data, the query results from multiple
shards are then combined together to give the final result.

Data replication
refers to keeping a copy of the data on the other server nodes for ensuring
availability in case of server node failure. This can also improve performance
by allowing multiple servers to process the data queries in parallel.

Target:
1. Creating
a Cluster in 3 Nodes with 3 Shards and 2 Replicas for each Shard.
2. Replica
of each Shard will be created in Another Node in a Circular Manner. i.e Node1 Shard1 Replica1 will create another copy in Node2
as Shard1 Replica2
Procedure to achieve the above requirement
Cluster
Architecture and Implementation

In this Architecture need to
create “3 Databases for each Database” in Every Node. One Database used to
create the tables for the shard data, second one is used to create the tables
for replica and Third database is useful to create all distributed tables to
access the data from all shards.
Clustername: sample_cluster
Databases Names: sample_shard / sample_replica / sample_dist
Nodes in Cluster:
chnode1(192.168.1.11) / chnode2(192.168.1.12) / chnode3(192.168.13)
Procedure:
Step 1: Install Clickhouse Software in
All the Nodes
https://clickhouse.com/docs/en/getting-started/install/
Step 2: Install Zookeeper Software in
All the Nodes and Configure Zookeeper Cluster. Zookeeper software is required
to process data to replicas.
Version: https://archive.apache.org/dist/zookeeper/zookeeper-3.5.6/apache-zookeeper-3.5.6-bin.tar.gz
https://phoenixnap.com/kb/install-apache-zookeeper
Step 3: Configure the Following Files
in Every Node
1. /etc/clickhouse-server/config.d/cluster.xml
2. /etc/clickhouse-server/config.d/zookeeper.xml
3. /etc/clickhouse-server/config.d/macro.xml
Note:
All the Configuration samples are given at the bottom
Step 4: Check the Configuration
SELECT * FROM system.macros m ;
SELECT * FROM system.clusters c WHERE cluster = 'sample_cluster';
Step 5: Create the Databases in All
the Nodes in the Cluster
#chnode1 (192.168.1.11) chnode2(192.168.1.12) chnode3(192.168.1.13)
clickhouse-client --query "CREATE DATABASE IF NOT EXISTS sample_shard"
clickhouse-client --query "CREATE DATABASE IF NOT EXISTS sample_replica”
clickhouse-client --query "CREATE DATABASE IF NOT EXISTS sample_dist"
Step
6: Create Tables in All Nodes with the following Options
#chnode1
clickhouse-client --query “create table if not exists sample_shard.x(n UInt8)ENGINE=ReplicatedMergeTree('/zookeeper/sample_cluster/sample_s1/x','r1')
ORDER BY (n)"
clickhouse-client --query “create table if not exists sample_replica.x(n UInt8)ENGINE=ReplicatedMergeTree('/zookeeper/sample_cluster/sample_s3/x','r2')
ORDER BY (n)"
clickhouse-client –query "create table if not exists sample_dist.x(n UInt8)ENGINE=Distributed('sample_cluster','','x',rand())"
#chnode2
clickhouse-client --query “create table if not exists sample_shard.x(n UInt8)ENGINE=ReplicatedMergeTree('/zookeeper/sample_cluster/sample_s2/x','r1')
ORDER BY (n)"
clickhouse-client --query “create table if not exists sample_replica.x(n UInt8)ENGINE=ReplicatedMergeTree('/zookeeper/sample_cluster/sample_s1/x','r2')
ORDER BY (n)"
clickhouse-client –query "create table if not exists sample_dist.x(n UInt8)ENGINE=Distributed('sample_cluster','','x',rand())"
#chnode3
clickhouse-client --query “create table if not exists sample_shard.x(n UInt8)ENGINE=ReplicatedMergeTree('/zookeeper/sample_cluster/sample_s3/x','r1')
ORDER BY (n)"
clickhouse-client --query “create table if not exists sample_replica.x(n UInt8)ENGINE=ReplicatedMergeTree('/zookeeper/sample_cluster/sample_s2/x','r2')
ORDER BY (n)"
clickhouse-client –query "create table if not exists sample_dist.x(n UInt8)ENGINE=Distributed('sample_cluster','','x',rand())"
The Zookeeper Files
Configured in the Tables given below for easy understand and comparison
chnode1zookeeper
/zookeeper/sample_cluster/sample_s1/x','r1'
/zookeeper/sample_cluster/sample_s3/x','r2'
chnode2zookeeper
/zookeeper/sample_cluster/sample_s2/x','r1'
/zookeeper/sample_cluster/sample_s1/x','r2'
chnode3
zookeeper
/zookeeper/sample_cluster/sample_s3/x','r1'
/zookeeper/sample_cluster/sample_s2/x','r2'
/etc/clickhouse-server/config.d/cluster.xml
<?xml
version="1.0"?>
<yandex>
<listen_host>::</listen_host>
<remote_servers>
<sample_cluster>
<shard>
<internal_replication>true</internal_replication>
<replica>
<default_database>sample_shard</default_database>
<host>chnode1</host>
<port>9000</port>
</replica>
<replica>
<default_database>sample_replica</default_database>
<host>chnode2</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<default_database>sample_shard</default_database>
<host>chnode2</host>
<port>9000</port>
</replica>
<replica>
<default_database>sample_replica</default_database>
<host>chnode3</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<default_database>sample_shard</default_database>
<host>chnode3</host>
<port>9000</port>
</replica>
<replica>
<default_database>sample_replica</default_database>
<host>chnode1</host>
<port>9000</port>
</replica>
</shard>
</sample_cluster>
</remote_servers>
</yandex>
/etc/clickhouse-server/config.d/zookeeper.xml
<?xml
version="1.0"?>
<yandex>
<zookeeper>
<node
index="1">
<host>chnode1</host>
<port>2181</port>
</node>
<node
index="2">
<host>chnode2</host>
<port>2181</port>
</node>
<node
index="3">
<host>chnode3</host>
<port>2181</port>
</node>
</zookeeper>
</yandex>
/etc/clickhouse-server/config.d/macro.xml
Note: The macro is usually
server-specific, it is better to store the configuration in separate config
files
|
chnode1 |
|
<?xml version="1.0"?> <yandex> <macros> <cluster01>sample_cluster</cluster01> <shard01>s1</shard01> <shard02>s3</shard02> <replica01>chnode1_s1_r1</replica01> <replica02>chnode1_s3_r2</replica02> </macros> </yandex> |
|
chnode2 |
|
<?xml version="1.0"?> <yandex> <macros> <cluster01>sample_cluster</cluster01> <shard01>s1</shard01> <shard02>s2</shard02> <replica01>chnode2_s2_r1</replica01> <replica02>chnode2_s1_r2</replica02> </macros> </yandex> |
|
chnode3 |
|
<?xml version="1.0"?> <yandex> <macros> <cluster01>sample_cluster</cluster01> <shard01>s2</shard01> <shard02>s3</shard02> <replica01>chnode3_s3_r1</replica01> <replica02>chnode3_s2_r2</replica02> </macros> </yandex> |