Big Data 139 - ClickHouse MergeTree Best Practices: Replacing Deduplication, Summing Aggregation, Partition Design & Materialized View Alternatives

TL;DR

  • Scenario: Solve two common “quasi-real-time detail table” requirements: deduplication/update and key-based summing.
  • Conclusion: ReplacingMergeTree for deduplication by sort key / latest version by version; SummingMergeTree for merging numeric column sums by sort key, both take effect during merge phase, need SELECT … FINAL to see final results immediately.
  • Output: Minimum runnable example (MRE), OPTIMIZE/FINAL usage timing, common pitfall location list, rollback and data repair script.

ReplacingMergeTree

Introduction

This engine is based on MergeTree, adds functionality to handle duplicate data. The difference from MergeTree is it will delete duplicate items with the same primary key.

Features

Replacing Deduplication Mechanism Details

  1. Sort key as unique key

    • System uses columns specified in ORDER BY clause as the only key to judge whether data is duplicate. These column combination values are considered duplicate when same.
    • Example: If set ORDER BY (user_id, event_time), then data with same user ID and event time is considered duplicate.
  2. Deduplication triggered at merge time

    • Deduplication only executed when background merges data files, not immediately processed at data insertion.
    • This design avoids real-time deduplication performance overhead, can batch process large amounts of data.
  3. Deduplication scope at partition level

    • Deduplication performed per data partition, each partition processed independently.
    • Duplicate data across partitions not automatically deleted, e.g., last month partition and current month partition may have duplicate data.
    • Application scenario: Suitable for time-partitioned scenarios, preserve historical partition data integrity.
  4. Depends on pre-sorted order

    • System relies on data being pre-sorted by ORDER BY key to efficiently identify duplicate data.
    • Implementation principle: Uses sorted data to linearly scan and discover continuous duplicate items, no extra computation needed.
  5. Handling strategy without version number

    • If version number (ver) column not configured:
      • Keep last row among duplicates (by physical storage order)
      • Example: For 3 duplicate records A, B, C, finally keep C
  6. Handling strategy with version number

    • If version number column set:
      • System keeps row with largest version number
      • Application scenario: Can implement optimistic locking, e.g., use timestamp or version number to mark data updates
      • Example: For 3 duplicate records (ver=1), (ver=3), (ver=2), finally keep record with ver=3

Cases

Create New Table

CREATE TABLE replace_table (
  id String,
  code String,
  create_time DateTime
) ENGINE = ReplacingMergeTree()
PARTITION BY toYYYYMM(create_time)
ORDER BY (id, code)
PRIMARY KEY id;

Insert Data

INSERT INTO replace_table VALUES ('A001', 'C1', '2024-08-01 08:00:00');
INSERT INTO replace_table VALUES ('A001', 'C1', '2024-08-02 08:00:00');
INSERT INTO replace_table VALUES ('A001', 'C8', '2024-08-03 08:00:00');
INSERT INTO replace_table VALUES ('A001', 'C9', '2024-08-04 08:00:00');
INSERT INTO replace_table VALUES ('A002', 'C2', '2024-08-05 08:00:00');
INSERT INTO replace_table VALUES ('A003', 'C3',-2024-08-06 08:00:00');

Continue inserting a piece of data:

INSERT INTO replace_table VALUES('A001', 'c1', '2024-01-01 08:00:00')

From the above observation, data in different partitions will not be deduplicated.

SummingMergeTree

Introduction

This engine comes from MergeTree, difference is when merging SummingMergeTree table data parts, ClickHouse merges rows with same sort key into one row, this row contains summarized values of numeric-type columns among merged rows. If aggregation key combination results in large number of rows for single key, can significantly reduce storage space and speed up data queries. For non-additive columns, takes first occurring value.

Features

  • Uses ORDER BY sort key as aggregation key
  • Triggers aggregation logic when merging partitions
  • Aggregates data per data partition, data from different partitions not aggregated
  • If columns to sum specified when defining engine (non-primary key) then SUM aggregates those fields
  • If not specified, aggregates all non-primary key numeric type fields
  • SUM aggregates same aggregation key data, relies on ORDER BY sorting
  • During SUM aggregation in same partition, non-aggregated column data keeps first row value
  • Supports nested structure, but column names must end with Map suffix

Case 1

Create New Table

CREATE TABLE smt_table (
  date Date,
  name String,
  a UInt16,
  b UInt16
) ENGINE = SummingMergeTree(date, (date, name), 8192, (a));

Insert Data

insert into smt_table (date, name, a, b) values ('2024-08-10', 'a', 1, 2);
insert into smt_table (date, name, a, b) values ('2024-08-10', 'b', 2, 1);
insert into smt_table (date, name, a, b) values ('2024-08-11', 'b', 3, 8);
insert into smt_table (date, name, a, b) values ('2024-08-11', 'b', 3, 8);
insert into smt_table (date, name, a, b) values ('2024-08-11', 'a', 3, 1);
insert into smt_table (date, name, a, b) values ('2024-08-12', 'c', 1, 3);

optimize

OPTIMIZE TABLE smt_table;
SELECT
  *
FROM
  smt_table;

Through observation, we find that on 2024-08-11, b and a columns merged and summed, b column took 8 (because b column data with 8 was inserted first).

Case 2

Create New Table

CREATE TABLE summing_table(
  id String,
  city String,
  v1 UInt32,
  v2 Float64,
  create_time DateTime
) ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(create_time)
ORDER BY (id, city);

Insert Data

insert into table summing_table values('A000','beijing',10,20,'2024-08-20 08:00:00');
insert into table summing_table values('A000','beijing',20,30,'2024-08-30 08:00:00');
insert into table summing_table values('A000','shanghai',10,20,'2024-08-20 08:00:00');
insert into table summing_table values('A000','beijing',10,20,'2024-06-20 08:00:00');
insert into table summing_table values('A001','beijing',50,60,'2024-02-20 08:00:00');

optimize

OPTIMIZE TABLE summing_table;
SELECT
  *
FROM
  summing_table;

Through observation, according to ORDER BY sort key (id, city) as aggregation key, because SUM columns not specified when creating table, all non-primary key numeric type columns were processed with SUM.

Case 3

SummingMergeTree supports nested type fields, but column names must end with Map suffix.

Create New Table

CREATE TABLE summing_table_nested(
  id String,
  nestMap Nested(
    id UInt32,
    key UInt32,
    val UInt64
  ),
  create_time DateTime
) ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(create_time)
ORDER BY id;

Insert Data

INSERT INTO summing_table_nested VALUES ('1', [101, 102], [201, 202], [1001, 1002], '2024-08-01 10:00:00');
INSERT INTO summing_table_nested VALUES ('2', [103, 104], [203, 204], [1003, 1004], '2024-08-01 10:00:00');
INSERT INTO summing_table_nested VALUES ('1', [105, 106], [205, 206], [1005, 1006], '2024-08-01 10:00:00');
INSERT INTO summing_table_nested VALUES ('2', [107, 108], [207, 208], [1007, 1008], '2024-08-02 10:00:00');
INSERT INTO summing_table_nested VALUES ('3', [109, 110], [209, 210], [1009, 1010], '2024-08-02 10:00:00');
INSERT INTO summing_table_nested VALUES ('4', [111, 112], [211, 212], [1011, 1012], '2024-08-02 10:00:00');

Can see we inserted 6 pieces of data, but only 4 pieces queried, also performed SUM processing.

Error Quick Reference

SymptomPossible CauseLocating MethodFix Action
Deduplication/aggregation not effectiveQuery without FINAL and background not mergedsystem.parts view active partsWait for merge/OPTIMIZE FINAL (off-peak)
Deduplication across months failedPartition split causingCheck PARTITION BY keyAdjust partition strategy/offline merge
Non-numeric column values “random”Summing keeps first row value causingReview structure & orderMove column out or change to dimension table
OPTIMIZE very slowLarge partition or high concurrencyView I/O and merge queueControl concurrency/refine partition/batch optimize

Rollback and Data Repair Script

-- 1) Recycle old partition after misoperation (example)
ALTER TABLE replace_table DETACH PARTITION 202408;
-- Export/fix then ATTACH
-- 2) Uniform rewrite: put details into clean table, then atomically switch
CREATE TABLE replace_table_fixed AS replace_table;
INSERT INTO replace_table_fixed SELECT * FROM replace_table FINAL; -- Get deduplicated details
EXCHANGE TABLES replace_table AND replace_table_fixed; -- Atomic table switch (new version supports)

FAQ

  • Does PRIMARY KEY participate in deduplication? No. It only affects sparse index; deduplication/aggregation looks at ORDER BY.

  • Can deduplication/aggregation cross partitions? No, need offline layer merge or change partition strategy.

  • Why is FINAL very slow? It merges parts online; only for verification or small reports.

  • Is Summing safe for floating point? Regular addition may have precision errors; use Decimal for financial scenarios.

  • How to choose version column for Replacing? Use monotonically increasing timestamp/version number; don’t use non-monotonic fields.