Gleam Lab · Tag Archive

Tag: clickhouse

11 articles collected by topic for tutorials, cases, engineering practice, and research notes.

ClickHouse MergeTree Partition/TTL, Materialized View, ALTER

ClickHouse is a columnar database for OLAP (Online Analytical Processing), favored in big data analysis for its high-speed data processing.

9/21/2024

Big Data 141 - ClickHouse Replicas: ReplicatedMergeTree and ZooKeeper

ReplicatedMergeTree ZooKeeper: Implements communication between multiple instances.

9/20/2024

ClickHouse Sharding × Replica × Distributed: ReplicatedMergeTree

Replica refers to storing the same data on different physical nodes in a distributed system. Its core idea is to improve system reliability through data redundancy.

9/20/2024

Big Data 139 - ClickHouse MergeTree Best Practices: Replacing Deduplication, Summing Aggregation, Partition Design & Materialized View Alternatives

Scenario: Solve two common "quasi-real-time detail table" requirements: deduplication/update and key-based summing.

9/19/2024

Big Data 140 - ClickHouse CollapsingMergeTree & External Data Sources

ClickHouse external data source engine guide: DDL templates, key parameters and read/write pipelines for ENGINE=HDFS, ENGINE=MySQL, ENGINE=Kafka, and distributed table co...

9/19/2024

Big Data 137 - ClickHouse MergeTree Practical Guide

ClickHouse MergeTree key mechanisms: batch writes form parts, background merge (Compact/Wide two part forms).

9/18/2024

Big Data 138 - ClickHouse MergeTree Deep Dive: Partition Pruning × Sparse Primary Index × Marks × Compression

ClickHouse MergeTree storage and query path: column files (*.bin), sparse primary index (primary.idx), marker files (.mrk/.

9/18/2024

Big Data 135 - ClickHouse Cluster Connectivity Self-Check & Data Types Guide | Run ON CLUSTER in 10 Minutes

Using three-node cluster (h121/122/123) as example, first complete cluster connectivity self-check: system.

9/14/2024

Big Data 136 - ClickHouse Table Engines: TinyLog/Log/StripeLog/Memory/Merge Selection Guide

Scenario: Need to trade-off among small data/temporary table/log landing/multi-table combined reads, often using MergeTree is "using a cannon to kill a mosquito".

9/14/2024

Big Data 133 - ClickHouse Concepts & Basics | Why Fast? Columnar + Vectorized + MergeTree Comparison

Scenario: Want high-concurrency low-latency OLAP, and don't want to use entire Hadoop/lakehouse.

9/13/2024

Big Data 134 - ClickHouse Single Machine + Cluster Node Deployment Guide | Installation Configuration | systemd Management / config.d

Official recommended keyring + signed-by installation of ClickHouse on Ubuntu, start with systemd and self-check

9/13/2024