Big Data 135 - ClickHouse Cluster Connectivity Self-Check & Data Types Guide | Run ON CLUSTER in 10 Minutes

TL;DR

  • Scenario: Three-node cluster already installed, how to verify connectivity/replication/sharding/distributed query are all normal within 10 minutes?
  • Conclusion: Following SOP in this article, one script runs through connectivity check → create ReplicatedMergeTree/Distributed → cross-shard write/read stress test.
  • Output: cluster_check.sql (system table check), ddl_on_cluster.sql (create table), mini_bench.sql (stress test).

Version Matrix

ComponentVerifiedNote
ClickHouse24.x/25.xNative port 9000 (native), HTTP 8123
OSUbuntu 22.04/24.04Either works
Coordination ServiceClickHouse Keeper/ZooKeeperCorrectly configured macros {shard}/{replica}
Cluster Nameperftest_3shards_1replicasConsistent with config.xml

Minimum Runnable Example

-- Connect to any node (change to your host/port/password)
clickhouse-client-m \
  --host <node1> --port 9000 \
  --user <user> --password <password>

-- 1) Cluster view should return 3 rows (3 shards × 1 replica)
SELECT cluster, shard_num, replica_num, host_name, host_address, port
FROM system.clusters
WHERE cluster = 'perftest_3shards_1replicas'
ORDER BY shard_num, replica_num;

-- 2) Basic connectivity (each node reports itself)
SELECT hostName() AS node, version() AS ch_version;

-- 3) Replicated table & distributed table (create on entire cluster at once)
CREATE DATABASE IF NOT EXISTS demo ON CLUSTER perftest_3shards_1replicas;

CREATE TABLE demo.events ON CLUSTER perftest_3shards_1replicas
(
  id UUID,
  ts DateTime,
  v  Decimal(18,4)
)
ENGINE = ReplicatedMergeTree(
  '/clickhouse/tables/{shard}/demo.events','{replica}'
)
PARTITION BY toYYYYMM(ts)
ORDER BY (ts, id);

CREATE TABLE demo.events_all ON CLUSTER perftest_3shards_1replicas
AS demo.events
ENGINE = Distributed(perftest_3shards_1replicas, demo, events, rand());

-- 4) Write 100k rows to distributed table (randomly distributed to shards)
INSERT INTO demo.events_all
SELECT generateUUIDv4(), now(), number/10.0
FROM numbers(100000);

-- 5) Quick check: row distribution across shards
SELECT hostName() AS node, count() AS cnt
FROM demo.events
GROUP BY node
ORDER BY node;

-- 6) Cross-shard aggregation (verify distributed query)
SELECT toStartOfMinute(ts) AS m, sum(v) AS s, count() c
FROM demo.events_all
GROUP BY m
ORDER BY m DESC
LIMIT 5;

Test Connection

clickhouse-client -m --host h121.wzk.icu --port 9000 --user default --password clickhouse@wzk.icu

If you can connect from 122 to 121, your service is set up. You can test whether the three nodes are all interconnected, ensuring services can communicate properly.

Check Cluster

Connect to any node for testing.

h121

clickhouse-client -m --host h121.wzk.icu --port 9000 --user default --password clickhouse@wzk.icu
SELECT * FROM system.clusters WHERE cluster = 'perftest_3shards_1replicas';

h122

clickhouse-client -m --host h122.wzk.icu --port 9001 --user default --password clickhouse@wzk.icu

h123

clickhouse-client -m --host h123.wzk.icu --port 9001 --user default --password clickhouse@wzk.icu

Data Types

Brief Introduction

As a high-performance column-oriented database for OLAP scenarios, ClickHouse has unique designs in Data Manipulation Language (DML) support. Different from traditional relational databases, ClickHouse optimizes data processing performance through:

  1. Composite Data Type Support:

    • Provides composite data types like Array, Tuple, Nested
    • Supports semi-structured data formats like Map, JSON
    • Example: Array(String) can store string arrays, Tuple(Int32, String) can store combined data of different types
  2. Data Modification Operations Implementation:

    • Update and Delete operations are implemented through ALTER TABLE statement variants
    • Example: ALTER TABLE table_name DELETE WHERE condition
    • Also: ALTER TABLE table_name UPDATE column=value WHERE condition
    • This design avoids traditional database locking mechanisms, improving efficiency of batch operations
  3. Performance Optimization Features:

    • Uses batch processing instead of single-row operations
    • Suitable for analysis scenarios with large batch data
    • Implements efficient data modification through MergeTree engine
    • Modification operations are asynchronous, generating new data parts instead of in-place modifications

This design allows ClickHouse to maintain high performance while supporting necessary data modification operations, especially suitable for data warehouse and analytical application scenarios.

Start Testing

Using h121 machine service here, but you can use any other machine’s service.

clickhouse-client -m --host h121.wzk.icu --port 9000 --user default --password clickhouse@wzk.icu

Integer Types

Fixed-length integer types, including signed and unsigned integers.

TypeRange
Int8-128~127
Int16-32768~32767
Int32-2147483648~2147483647
Int64-9223372036854775808~9223372036854775807
UInt80~255
UInt160~65535
UInt320~4294967295
UInt640~18446744073709551615
SELECT 255;
SELECT -128;

Floating Point Types

TypePrecisionNote
Float32float
Float64double

It is recommended to store data as integers as much as possible, for example, convert fixed-precision numbers to integer values, like time represented in milliseconds, because floating-point calculation behaviors may cause rounding errors.

SELECT 1-0.9;
SELECT 1/0;
SELECT 0/0;

Decimal

For higher precision, can choose Decimal type, format: Decimal(P,S)

  • P: Represents precision, determines total digits (positive part + decimal part), range 0-38
  • S: Represents scale, determines decimal places, range 0-P

ClickHouse provides three shortcuts for Decimal:

  • Decimal32
  • Decimal64
  • Decimal128

Addition/Subtraction - Precision Takes Max

SELECT toDecimal32(2, 4) + toDecimal32(2, 2);
SELECT toDecimal32(4, 4) + toDecimal32(2, 2);

Multiplication - Precision Takes Sum

SELECT toDecimal32(2, 2) * toDecimal32(4, 4)

Division - Precision Takes Dividend

SELECT toDecimal32(4, 4) / toDecimal32(2, 2)

String Types

String

String can be of any length, it can contain any character set, including null bytes.

FixedString(N)

Fixed-length string of N, N must be a strict positive natural number. When server reads strings shorter than N, it pads with null bytes at the end to reach N bytes. When server reads strings longer than N, returns error.

SELECT toFixedString('abc', 5), LENGTH(toFixedString('abc', 5)) AS LENGTH;

UUID

ClickHouse directly makes UUID, which serves as primary key type in traditional databases, into a data type.

Create Table

CREATE TABLE UUID_TEST(
  `c1` UUID,
  `c2` String
)ENGINE = memory;

Insert Data

INSERT INTO UUID_TEST SELECT generateUUIDv4(), 't1';
INSERT INTO UUID_TEST(c2) VALUES('t2');

Query Results

SELECT * FROM UUID_TEST;

Enum Types

Includes Enum8 and Enum16 types, store string = integer mapping.

  • Enum8 uses String = Int8 pairs
  • Enum16 uses String = Int16 pairs

Create Table

CREATE TABLE t_enum (
  x Enum8('hello' = 1, 'word' = 2)
) ENGINE = TinyLog;

Note: This column x can only store values defined in the list, “hello” or “word”. Inserting other values will throw error.

Insert Data

INSERT INTO t_enum VALUES ('hello'), ('word'), ('hello');

Query Data

SELECT * FROM t_enum;

To see corresponding numeric values, must convert Enum to integer type.

SELECT CAST(x, 'Int8') FROM t_enum;

Array

Array(T): Array composed of elements of type T. T can be any type, including array type, but multi-dimensional arrays are not recommended, ClickHouse has limited support for multi-dimensional arrays. For example, cannot store multi-dimensional arrays in MergeTree tables.

Create Array

SELECT array(1, 2.0) AS x, toTypeName(x);
SELECT [1, 2] AS x, toTypeName(x);

When declaring fields, need to specify data type:

CREATE TABLE Array_test (
  `c1` Array(String)
) ENGINE = Memory;

Tuple

Tuple(T1, T2): Tuple, each element has its own type.

Create tuple example:

SELECT tuple(1, 'a') AS x, toTypeName(x);

Note: When defining table fields, also need to specify data types.

Date, DateTime

Date type, stored in two bytes, represents date value from 1970-01-01 (unsigned) to current date.

Boolean

No separate type to store boolean values, can use UInt8 type with value limited to 0 or 1.

Error Quick Reference

SymptomPossible Root CauseQuick Locating MethodHandling Suggestion
is_readonly=1 replica read-onlysystem.replicasCheck disk/permissions/config, restart after fixing config
Distributed table finds no dataWrote to local tableCheck FROM table name unified to events_allEnsure query and write table names are consistent
lag continuously > 0Replication backlogView system.mutationsWait or scale up/limit write speed
Create table failedMacro/path inconsistentCheck {shard}/{replica}Fix config.xml and restart service