Replica Introduction
ReplicatedMergeTree ZooKeeper: Implements communication between multiple instances.
Replica Characteristics
As the primary carrier for data replicas, ReplicatedMergeTree has some design considerations:
- ZooKeeper Dependency: When executing INSERT and ALTER queries, ReplicatedMergeTree needs ZooKeeper’s distributed coordination to achieve synchronization between replicas. However, ZooKeeper is not required when querying replicas.
- Table-level Replicas: Replicas are defined at the table level, so each table’s replica configuration can be customized based on actual needs, including number of replicas and their distribution in the cluster.
- Multi Master Architecture: INSERT and ALTER queries can be executed on any replica, they have the same effect. These operations are distributed to each replica for local execution via ZooKeeper coordination.
- Block Data Blocks: When writing data with INSERT command, data is split into several Block data blocks based on max_block_size (default 1048576 rows). Block data blocks are the basic unit of data writing, with atomicity and uniqueness.
- Atomicity: During data writing, data within a Block either all succeeds or all fails.
- Uniqueness: When writing a Block data block, hash digest is calculated and recorded based on data order, rows, and size within the Block. If a Block to be written has the same hash digest as a previously written Block (same data order, size, and rows), that Block is ignored. This design prevents duplicate Block writes caused by exceptions.
ZK Configuration
<yandex>
<zookeeper-servers>
<node index="1">
<host>h121.wzk.icu</host>
<port>2181</port>
</node>
<node index="2">
<host>h122.wzk.icu</host>
<port>2181</port>
</node>
<node index="3">
<host>h123.wzk.icu</host>
<port>2181</port>
</node>
</zookeeper-servers>
</yandex>
Enable ZK
Need to enable in config file:
vim /etc/clickhouse-server/config.xml
# Add the following line where you configured before
<include_from>/etc/clickhouse-server/config.d/metrika.xml</include_from>
# If you didn't have the following line before
<zookeeper incl="zookeeper-servers" optional="true" />
Restart Service
systemctl restart clickhouse-server
Verify
-- Connect to ClickHouse
clickhouse-client -m --host h121.wzk.icu --port 9001 --user default --password clickhouse@wzk.icu
-- Check if successfully connected to ZooKeeper
SELECT * FROM system.zookeeper WHERE path = '/';
Replica Definition
Create New Table
CREATE TABLE replicated_sales_5(
`id` String,
`price` Float64,
`create_time` DateTime
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/01/replicated_sales_5', 'h121.wzk.icu')
PARTITION BY toYYYYMM(create_time)
ORDER BY id;
- /clickhouse/tables - conventional path
- /01/ - shard number
- replicated_sales_5 - table name, recommend same as physical table
- h121.wzk.icu - replica name in ZK, conventionally server name
ReplicatedMergeTree Principle
Data Structure
[zk: localhost:2181(CONNECTED) 7] ls /clickhouse/tables/01/replicated_sales_5
[alter_partition_version, block_numbers, blocks, columns, leader_election, log, metadata, mutations, nonincrement_block_numbers, part_moves_shard, pinned_part_uuids, quorum, replicas, table_shared_id, temp, zero_copy_hdfs, zero_copy_s3]
Metadata:
- metadata: Metadata - primary key, sampling expression, partition key
- columns: Column field data types and names
- replicas: Replica names
Flags:
- leader_election: Master replica election path
- blocks: Hash value (duplicate data insertion), partition_id
- max_insert_block_size: 1048576 rows
- block_numbers: Block order within same partition
- quorum: Number of replicas
Operations:
- log: log-000000 - regular operations
- mutations: delete update
Create Table 1
CREATE TABLE a1(
id String,
price Float64,
create_time DateTime
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/01/a1', 'h121.wzk.icu')
PARTITION BY toYYYYMM(create_time)
ORDER BY id;
- Initialize all ZK nodes based on zk_path
- Register own replica instance under replicas node
- Start listening task, listen to LOG node
- Participate in replica election, elect master replica - first to insert becomes master
Create Table 2
Create second replica instance:
clickhouse-client -m --host h122.wzk.icu --port 9001 --user default --password clickhouse@wzk.icu
CREATE TABLE a1(
id String,
price Float64,
create_time DateTime
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/01/a1', 'h122.wzk.icu')
PARTITION BY toYYYYMM(create_time)
ORDER BY id;
At this point participate in replica election, h121.wzk.icu becomes master replica.
Insert Data 1
insert into table a1 values('A001',100,'2024-08-20 08:00:00');
View Results
After execution, view data on ZK:
ls /clickhouse/tables/01/a1/blocks
Output shows: After insert command executes, local partition directory is written, then write block_id for that partition:
[zk: localhost:2181(CONNECTED) 6] ls /clickhouse/tables/01/a1/blocks
[202408_16261221490105862188_1058020630609096934]
View Logs
Next, h121.wzk.icu replica initiates push of operation log to LOG:
[zk: localhost:2181(CONNECTED) 7] ls /clickhouse/tables/01/a1/log
[log-0000000000]
Insert another piece of data:
insert into table a1 values('A002',200,'2024-08-21 08:00:00');
View LOG:
ls /clickhouse/tables/01/a1/log
get /clickhouse/tables/01/a1/log/log-0000000000
get /clickhouse/tables/01/a1/log/log-0000000001
Output:
[zk: localhost:2181(CONNECTED) 14] ls /clickhouse/tables/01/a1/log
[log-0000000000, log-0000000001]
[zk: localhost:2181(CONNECTED) 13] get /clickhouse/tables/01/a1/log/log-0000000000
format version: 4
create_time: 2024-08-01 17:10:35
source replica: h121.wzk.icu
block_id: 202408_16261221490105862188_1058020630609096934
get
202408_0_0_0
part_type: Compact
[zk: localhost:2181(CONNECTED) 16] get /clickhouse/tables/01/a1/log/log-0000000001
format version: 4
create_time: 2024-08-01 17:16:37
source replica: h121.wzk.icu
block_id: 202408_3260633639629896920_11326802927295833243
get
202408_1_1_0
part_type: Compact
Pull Logs
Next, second replica pulls LOG: h122.wzk.icu node always monitors /log node changes. When h121.wzk.icu pushes /log/log-0000, 0001, h122.wzk.icu triggers log pull task and updates log_pointer.
Error Quick Reference
| Symptom | Root Cause | Location | Fix |
|---|---|---|---|
| Not connected to ZooKeeper | ZK address/permission/network | telnet zk 2181 / system.zookeeper | Fix config, firewall, restart |
| Long-time absolute_delay > 0 | Slow log pulling/backlog | system.replication_queue | SYSTEM SYNC REPLICA; check disk/CPU |
| Replica … already exists | Duplicate registration | system.replicas | DROP/DETACH then ATTACH or change {replica} |
| Too many parts read-only | Too many small parts | system.parts | Adjust batch/merge strategy, reduce write fragmentation |
| Transaction/DDL inconsistent | DDL not executed on all replicas | system.query_log | Always use ON CLUSTER or cluster_all_replicas=1 |