Big Data 135 - ClickHouse Cluster Connectivity Self-Check & Data Types Guide | Run ON CLUSTER in 10 Minutes
TL;DR
- Scenario: Three-node cluster already installed, how to verify connectivity/replication/sharding/distributed query are all normal within 10 minutes?
- Conclusion: Following SOP in this article, one script runs through connectivity check → create ReplicatedMergeTree/Distributed → cross-shard write/read stress test.
- Output: cluster_check.sql (system table check), ddl_on_cluster.sql (create table), mini_bench.sql (stress test).
Version Matrix
| Component | Verified | Note |
|---|---|---|
| ClickHouse | 24.x/25.x | Native port 9000 (native), HTTP 8123 |
| OS | Ubuntu 22.04/24.04 | Either works |
| Coordination Service | ClickHouse Keeper/ZooKeeper | Correctly configured macros {shard}/{replica} |
| Cluster Name | perftest_3shards_1replicas | Consistent with config.xml |
Minimum Runnable Example
-- Connect to any node (change to your host/port/password)
clickhouse-client-m \
--host <node1> --port 9000 \
--user <user> --password <password>
-- 1) Cluster view should return 3 rows (3 shards × 1 replica)
SELECT cluster, shard_num, replica_num, host_name, host_address, port
FROM system.clusters
WHERE cluster = 'perftest_3shards_1replicas'
ORDER BY shard_num, replica_num;
-- 2) Basic connectivity (each node reports itself)
SELECT hostName() AS node, version() AS ch_version;
-- 3) Replicated table & distributed table (create on entire cluster at once)
CREATE DATABASE IF NOT EXISTS demo ON CLUSTER perftest_3shards_1replicas;
CREATE TABLE demo.events ON CLUSTER perftest_3shards_1replicas
(
id UUID,
ts DateTime,
v Decimal(18,4)
)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/{shard}/demo.events','{replica}'
)
PARTITION BY toYYYYMM(ts)
ORDER BY (ts, id);
CREATE TABLE demo.events_all ON CLUSTER perftest_3shards_1replicas
AS demo.events
ENGINE = Distributed(perftest_3shards_1replicas, demo, events, rand());
-- 4) Write 100k rows to distributed table (randomly distributed to shards)
INSERT INTO demo.events_all
SELECT generateUUIDv4(), now(), number/10.0
FROM numbers(100000);
-- 5) Quick check: row distribution across shards
SELECT hostName() AS node, count() AS cnt
FROM demo.events
GROUP BY node
ORDER BY node;
-- 6) Cross-shard aggregation (verify distributed query)
SELECT toStartOfMinute(ts) AS m, sum(v) AS s, count() c
FROM demo.events_all
GROUP BY m
ORDER BY m DESC
LIMIT 5;
Test Connection
clickhouse-client -m --host h121.wzk.icu --port 9000 --user default --password clickhouse@wzk.icu
If you can connect from 122 to 121, your service is set up. You can test whether the three nodes are all interconnected, ensuring services can communicate properly.
Check Cluster
Connect to any node for testing.
h121
clickhouse-client -m --host h121.wzk.icu --port 9000 --user default --password clickhouse@wzk.icu
SELECT * FROM system.clusters WHERE cluster = 'perftest_3shards_1replicas';
h122
clickhouse-client -m --host h122.wzk.icu --port 9001 --user default --password clickhouse@wzk.icu
h123
clickhouse-client -m --host h123.wzk.icu --port 9001 --user default --password clickhouse@wzk.icu
Data Types
Brief Introduction
As a high-performance column-oriented database for OLAP scenarios, ClickHouse has unique designs in Data Manipulation Language (DML) support. Different from traditional relational databases, ClickHouse optimizes data processing performance through:
-
Composite Data Type Support:
- Provides composite data types like Array, Tuple, Nested
- Supports semi-structured data formats like Map, JSON
- Example:
Array(String)can store string arrays,Tuple(Int32, String)can store combined data of different types
-
Data Modification Operations Implementation:
- Update and Delete operations are implemented through ALTER TABLE statement variants
- Example:
ALTER TABLE table_name DELETE WHERE condition - Also:
ALTER TABLE table_name UPDATE column=value WHERE condition - This design avoids traditional database locking mechanisms, improving efficiency of batch operations
-
Performance Optimization Features:
- Uses batch processing instead of single-row operations
- Suitable for analysis scenarios with large batch data
- Implements efficient data modification through MergeTree engine
- Modification operations are asynchronous, generating new data parts instead of in-place modifications
This design allows ClickHouse to maintain high performance while supporting necessary data modification operations, especially suitable for data warehouse and analytical application scenarios.
Start Testing
Using h121 machine service here, but you can use any other machine’s service.
clickhouse-client -m --host h121.wzk.icu --port 9000 --user default --password clickhouse@wzk.icu
Integer Types
Fixed-length integer types, including signed and unsigned integers.
| Type | Range |
|---|---|
| Int8 | -128~127 |
| Int16 | -32768~32767 |
| Int32 | -2147483648~2147483647 |
| Int64 | -9223372036854775808~9223372036854775807 |
| UInt8 | 0~255 |
| UInt16 | 0~65535 |
| UInt32 | 0~4294967295 |
| UInt64 | 0~18446744073709551615 |
SELECT 255;
SELECT -128;
Floating Point Types
| Type | Precision | Note |
|---|---|---|
| Float32 | float | |
| Float64 | double |
It is recommended to store data as integers as much as possible, for example, convert fixed-precision numbers to integer values, like time represented in milliseconds, because floating-point calculation behaviors may cause rounding errors.
SELECT 1-0.9;
SELECT 1/0;
SELECT 0/0;
Decimal
For higher precision, can choose Decimal type, format: Decimal(P,S)
- P: Represents precision, determines total digits (positive part + decimal part), range 0-38
- S: Represents scale, determines decimal places, range 0-P
ClickHouse provides three shortcuts for Decimal:
- Decimal32
- Decimal64
- Decimal128
Addition/Subtraction - Precision Takes Max
SELECT toDecimal32(2, 4) + toDecimal32(2, 2);
SELECT toDecimal32(4, 4) + toDecimal32(2, 2);
Multiplication - Precision Takes Sum
SELECT toDecimal32(2, 2) * toDecimal32(4, 4)
Division - Precision Takes Dividend
SELECT toDecimal32(4, 4) / toDecimal32(2, 2)
String Types
String
String can be of any length, it can contain any character set, including null bytes.
FixedString(N)
Fixed-length string of N, N must be a strict positive natural number. When server reads strings shorter than N, it pads with null bytes at the end to reach N bytes. When server reads strings longer than N, returns error.
SELECT toFixedString('abc', 5), LENGTH(toFixedString('abc', 5)) AS LENGTH;
UUID
ClickHouse directly makes UUID, which serves as primary key type in traditional databases, into a data type.
Create Table
CREATE TABLE UUID_TEST(
`c1` UUID,
`c2` String
)ENGINE = memory;
Insert Data
INSERT INTO UUID_TEST SELECT generateUUIDv4(), 't1';
INSERT INTO UUID_TEST(c2) VALUES('t2');
Query Results
SELECT * FROM UUID_TEST;
Enum Types
Includes Enum8 and Enum16 types, store string = integer mapping.
- Enum8 uses String = Int8 pairs
- Enum16 uses String = Int16 pairs
Create Table
CREATE TABLE t_enum (
x Enum8('hello' = 1, 'word' = 2)
) ENGINE = TinyLog;
Note: This column x can only store values defined in the list, “hello” or “word”. Inserting other values will throw error.
Insert Data
INSERT INTO t_enum VALUES ('hello'), ('word'), ('hello');
Query Data
SELECT * FROM t_enum;
To see corresponding numeric values, must convert Enum to integer type.
SELECT CAST(x, 'Int8') FROM t_enum;
Array
Array(T): Array composed of elements of type T. T can be any type, including array type, but multi-dimensional arrays are not recommended, ClickHouse has limited support for multi-dimensional arrays. For example, cannot store multi-dimensional arrays in MergeTree tables.
Create Array
SELECT array(1, 2.0) AS x, toTypeName(x);
SELECT [1, 2] AS x, toTypeName(x);
When declaring fields, need to specify data type:
CREATE TABLE Array_test (
`c1` Array(String)
) ENGINE = Memory;
Tuple
Tuple(T1, T2): Tuple, each element has its own type.
Create tuple example:
SELECT tuple(1, 'a') AS x, toTypeName(x);
Note: When defining table fields, also need to specify data types.
Date, DateTime
Date type, stored in two bytes, represents date value from 1970-01-01 (unsigned) to current date.
Boolean
No separate type to store boolean values, can use UInt8 type with value limited to 0 or 1.
Error Quick Reference
| Symptom | Possible Root Cause | Quick Locating Method | Handling Suggestion |
|---|---|---|---|
| is_readonly=1 replica read-only | system.replicas | Check disk/permissions/config, restart after fixing config | |
| Distributed table finds no data | Wrote to local table | Check FROM table name unified to events_all | Ensure query and write table names are consistent |
| lag continuously > 0 | Replication backlog | View system.mutations | Wait or scale up/limit write speed |
| Create table failed | Macro/path inconsistent | Check {shard}/{replica} | Fix config.xml and restart service |