Big Data 136 - ClickHouse Table Engines: TinyLog/Log/StripeLog/Memory/Merge Selection Guide
TL;DR
- Scenario: Need to trade-off among small data/temporary table/log landing/multi-table combined reads, often using MergeTree is “using a cannon to kill a mosquito”.
- Conclusion: Using this selection table + MRE + concurrency/file verification scripts, run through core differences of 5 engines in 10 minutes.
- Output: All detailed testing process of engines, with SQL and other content.
Version Matrix
| Item | Value | Note |
|---|---|---|
| Verified | ClickHouse Server 23.x/24.x | Single machine + local disk |
| Client | clickhouse-client | —multiquery demo |
90-Second Principles
- TinyLog: One .bin compressed file per column, append-write; basically no concurrency handling, reads limited during writes.
- Log: Stores markers (offsets) next to columns, supports multi-threaded reads; same table writes block other writes/reads.
- StripeLog: Writes all columns to one data.bin, markers centralized, less FD usage, stable read parallelism.
- Memory: Pure memory storage, lost on restart; suitable for small volume/testing/high-speed operators.
- Merge: Read-only “aggregation view”, stores no data; queries pushed down to underlying table engine.
Selection Decision
| Scenario | Recommended Reason | Not Recommended Reason |
|---|---|---|
| Small table, one-time write, config/dimension (≤1M rows/table) | TinyLog lightest, files intuitive, read/write simple | MergeTree (overly complex) |
| Temporary log, batch import then read-only, want parallel reads | Log has markers, concurrent reads, INSERT forms blocks | TinyLog (read locked/poor concurrency) |
| Many small tables (thousands), want more stable read parallelism | StripeLog single file + markers, fewer file handles, read parallelism more natural | Log (more file fragmentation) |
| Extreme low-latency small data, pure memory operators | Memory in-memory, >GB/s throughput, testing/operator assembly | Any persistence requirement |
| Multi-table unified “virtual view” read | Merge regex aggregates multiple tables, parallel read falls back to underlying engine | Need write/need index |
Brief Introduction
Table engine (table type) is the core component determining data management and processing in database system. It directly affects database performance, functional features and applicable scenarios. Specifically, table engine determines several key aspects:
-
Data Storage Mechanism:
- Physical storage format: Row storage (like InnoDB), column storage (like ClickHouse’s MergeTree) or memory storage (like Memory engine)
- Storage location: Can be stored on disk, SSD, memory or distributed file systems
- Read/write path: Some engines first write to memory buffer then persist to disk
-
Query Support Capabilities:
- Supported query types: Full-text search, aggregation queries, time-series data processing, etc.
- Query optimization methods: Different engines use different query execution plans
- Special function support: Like geospatial data processing, JSON document processing
-
Concurrency Control:
- Locking mechanisms: Row-level locking, table locking or optimistic concurrency control
- Transaction isolation levels: Read committed, repeatable read, etc.
- MVCC implementation
-
Index Strategy:
- Index types: B-tree, LSM tree, inverted index, etc.
- Index maintenance: Some engines support automatic maintenance, some require manual
- Index selectivity: Different engines may have significantly different index usage efficiency
-
Parallel Processing Capabilities:
- Multi-threaded query execution: Some engines can decompose single query into multiple parallel tasks
- Distributed query processing: Execute queries in parallel across nodes in cluster environment
- Resource isolation: Control concurrent query resource consumption
-
Data High Availability:
- Replication mechanisms: Like primary-replica replication, multi-primary replication
- Fault recovery: Automatic fault transfer and recovery capabilities
- Data consistency guarantees: Strong consistency, eventual consistency, etc.
-
Special Functions:
- Data TTL (Time To Live)
- Data compression algorithms
- Encrypted storage support
- Materialized view support
ClickHouse is a column-oriented database management system supporting multiple table engines, each with specific functions and uses.
MergeTree Series
- MergeTree: Most commonly used table engine, supports efficient partitioning, sorting, indexing, suitable for high-volume write and query scenarios. Supports primary key and indexes.
- ReplicatedMergeTree: Based on MergeTree with added replication, suitable for distributed cluster environments.
- ReplacingMergeTree: Allows newest records to overwrite old records, very suitable for scenarios needing deduplication by specific columns.
- SummingMergeTree: Supports aggregation of numeric columns, suitable for scenarios needing aggregation calculations.
- AggregatingMergeTree: Supports more complex aggregation operations, suitable for scenarios needing pre-computed summaries.
- CollapsingMergeTree: Used for log-style data, reduces storage by merging “begin” and “end” records.
- VersionedCollapsingMergeTree: Adds version number on top of CollapsingMergeTree for better control of data merging.
Log Series
- Log: Simple table engine, doesn’t support indexes or partitioning, suitable for small data volume or log-style storage scenarios.
- TinyLog: Suitable for embedded scenarios or testing, simpler performance, cannot handle large-scale data.
- StripeLog: Suitable for SSD scenarios, writes by row but organizes data by blocks, suitable for certain specific read/write patterns.
- Memory: Data stored only in memory, suitable for scenarios needing fast read/write without persistence.
- Distributed: Used in distributed clusters, dispatches queries to multiple nodes, suitable for large-scale data and high-concurrency query scenarios.
- Merge: Treats multiple tables as a virtual table for queries, suitable for scenarios needing to union multiple tables for reading.
- Join: Pre-loads and stores Join tables, used to improve join operation efficiency.
View Series
- MaterializedView: Materialized view, allows pre-computation to speed up queries.
- View: Regular view, doesn’t store data, just query definition.
- Buffer: Temporarily stores data in memory, periodically batch writes to base table, suitable for scenarios needing write performance optimization.
- Null: Discards data when written, suitable for testing scenarios.
Log Series Details
TinyLog
Storage Structure and Mechanism
This table engine uses the most basic column-oriented storage, each column data stored as a separate compressed file (like .bin file), uses lightweight compression algorithms (like LZ4 or ZSTD) to reduce disk space. Write operations use append-only mode, all new data is added to the end of corresponding column files without modifying existing data.
Concurrency Limitations
- Read Concurrency:
- When write operation is in progress, any concurrent read operation will immediately throw “Table is locked for writing” exception
- Example scenario: If background task is importing data, user query will fail directly
- Write Concurrency:
- Multiple simultaneous write operations will cause data file corruption
- Typical problems: Incomplete column data, inconsistent row count, data garbled, etc.
Applicable Scenario Analysis
- Best Use Pattern:
- One-time writes (Write-Once-Read-Many)
- Suitable for config tables, dimension tables that don’t change often
- Typical applications: Storing product categories, region codes and other basic data
- Capacity Suggestions:
- Official recommended upper limit: 1,000,000 rows
- Actual performance inflection point: About 50-100MB single table data volume
- Small table advantage: Single query usually only needs to open 2-3 column files
- Special Scenario Advantages:
- Excellent performance under small table cluster (1,000+ small tables) scenarios
- Can reduce file descriptor usage by over 50% compared to other engines
- Ideal choice for cold data archival storage
Feature Limitations
- Index Support:
- Completely doesn’t support any type of index
- All queries are full table scans
- Solution: Consider materialized views for high-frequency query fields
- Performance Characteristics:
- Write speed: About 10-50MB/s (depends on hardware)
- Read latency: Linearly related to data volume
- Memory usage: Only needs most basic buffer (usually <10MB)
- Maintenance Notes:
- Doesn’t support ALTER TABLE to modify table structure
- Data deletion requires table rebuild
- No automatic compression organization mechanism
Test
Create table with TinyLog engine and insert one piece of data:
CREATE table t (a UInt16, b String) ENGINE = TinyLog;
INSERT INTO t (a, b) VALUES (1, 'abc');
View directory storing data:
cd /var/lib/clickhouse/data/default/t
ls
File list explanation:
- a.bin and b.bin are compressed data for corresponding columns
- sizes.json records each bin file’s size
Log
Different from TinyLog, Log stores markers in small files alongside column files, these markers are written on each data block and contain offsets indicating where to start reading to skip specified rows. This allows reading table data in multiple threads, for concurrent data access, read operations can execute simultaneously while writes block reads and other writes. Log engine doesn’t support indexes. Similarly, if write fails, table will be corrupted and reading from that table will return error. Log engine is suitable for temporary data, write-once tables and testing or demonstration purposes.
StripeLog
This engine belongs to log engine series, use this engine when you need to write many small data volume (less than 100 million rows) tables.
Write Data
StripeLog engine stores all columns in one file, for each INSERT request, ClickHouse appends data block at end of table file, writes column by column.
ClickHouse writes following files for each table:
- data.bin data file
- index.mrk marker file, marker contains offsets of each column in each inserted data block
StripeLog engine doesn’t support ALTER, UPDATE, ALTER DELETE operations.
Read Data
With marker files, ClickHouse can read data in parallel, which means SELECT request returns unpredictable row order, use ORDER BY clause to sort rows.
Create Table
CREATE TABLE stripe_log_table (
timestamp DateTime,
message_type String,
message String
) ENGINE = StripeLog;
Insert Data
INSERT INTO stripe_log_table VALUES (now(), 'REGULAR', 'The first reqular message');
INSERT INTO stripe_log_table VALUES
(now(), 'REGULAR', 'The second regular message'),
(now(), 'WARNING', 'The first warning message');
We use two INSERT requests to create two data blocks in data.bin file.
Query Data
ClickHouse uses multiple threads when querying data, each thread reads separate data and independently returns result rows after completion. As a result, in most cases, order of blocks in output is different from order of corresponding blocks in input, for example:
SELECT * FROM stripe_log_table;
# Sort results (default ascending)
SELECT * FROM stripe_log_table ORDER BY timestamp;
Memory
Memory engine, data stored in memory in uncompressed original form, data lost on server restart. Read/write operations don’t block each other, doesn’t support indexes. Has very high performance in simple queries: over 10GB/s. Generally used not much, except for testing, or scenarios needing very high performance but data volume cannot be too large (upper limit about 100 million rows).
Merge
Merge engine (don’t confuse with MergeTree) itself doesn’t store data, but can be used to read data from any number of other tables simultaneously, reads are automatically parallel, doesn’t support writes. When reading, the engine of tables actually having data (if any) will be used.
Merge parameters:
- Database name
- Table name regex pattern
Create Table
CREATE table t1 (id UInt16, name String) ENGINE = TinyLog;
CREATE table t2 (id UInt16, name String) ENGINE = TinyLog;
CREATE table t3 (id UInt16, name String) ENGINE = TinyLog;
Insert Data
INSERT INTO t1 (id, name) VALUES (1, 'first');
INSERT INTO t2 (id, name) VALUES (2, 'second');
INSERT INTO t3 (id, name) VALUES (3, 'i am in t3');
Create Link
CREATE TABLE t (id UInt16, name String) ENGINE = Merge(currentDatabase(), '^t');
Common Issues
| Symptom/Error | Possible Root Cause | Quick Fix |
|---|---|---|
| Table is locked for writing | TinyLog read failed (INSERT in progress) | show processlist; switch to Log/StripeLog or avoid concurrency |
| SELECT order chaotic | Multi-threaded read, no marker order guarantee | EXPLAIN PIPELINE; ORDER BY for explicit sort |
| Merge can’t find data | Regex not matching/wrong db name | SELECT name FROM system.tables; fix Merge(db, 'regex') |
| Directory files don’t match | Version difference (.mrk/.mrk3) | Check system.build_options; update docs/verification script by version |