Blog

Kafka Consumer: Consumption Flow, Heartbeat & Parameter T...

Detailed explanation of Kafka Consumer Group consumption model, partition assignment strategy, heartbeat keep-alive mechanism, and tuning practices for key parameters like session.timeout.ms, heart...

9/25/2024

big-dataclickhousemergetree

Apache Kudu Docker Quick Deployment: 3 Master/5 TServer P...

Apache Kudu Docker Compose quick deployment solution on Ubuntu 22.04 cloud host, covering Kudu Master and Tablet Server components,...

9/24/2024

big-datakududocker

Java Access Apache Kudu: Table Creation to CRUD (Includin...

Java client (kudu-client 1.4.0) connects to Apache Kudu with multiple Masters (example ports 7051/7151/7251), completes full process of table creation, insert,...

9/24/2024

big-datakudujava

Apache Kudu: Real-time Write + OLAP Architecture, Perform...

Apache Kudu in 2025 version and ecosystem integration: Latest Kudu 1.18.0 (2025/07) released, bringing segmented LRU Block Cache and RocksDB-based metadata...

9/23/2024

big-datakuduolap

Apache Kudu Architecture & Practice: RowSet, Partition & ...

Apache Kudu's Master/TabletServer architecture, RowSet (MemRowSet/DiskRowSet) write/read path, MVCC, and Raft consensus role in replica and failover; provides...

9/23/2024

big-datakuduraft

ClickHouse MergeTree Partition/TTL, Materialized View, AL...

ClickHouse beginner and operations practice, based on real cluster (h121/h122/h123) demonstrating complete process from connection to database/table creation,...

9/21/2024

Kafka Producer Message Sending Flow & Core Parameters

Deep analysis of Kafka Producer initialization, message interception, serialization, partition routing, buffer batch sending, ACK confirmation and complete sending chain, with key parameter tuning ...

9/21/2024

Kafka Serialization & Partitioning: Custom Implementation

Deep dive into Kafka message serialization and partition routing, including complete code for custom Serializer and Partitioner, mastering precise message routing and efficient transmission.

9/21/2024

big-dataclickhousezookeeper

ClickHouse Replica Deep Dive: ReplicatedMergeTree + ZooKe...

ClickHouse replica full chain: ZK/Keeper preparation, macros configuration, ON CLUSTER consistent table creation, write deduplication & replication mechanism,...

9/20/2024

ClickHouse Sharding × Replica × Distributed: ReplicatedMe...

ClickHouse sharding × replica × Distributed architecture: Based on ReplicatedMergeTree + Distributed, using ON CLUSTER one-click table creation on 3-shard ×...

9/20/2024

big-dataclickhousedistributed

ClickHouse MergeTree Best Practices: Replacing Deduplicat...

ClickHouse two light aggregation engines ReplacingMergeTree and SummingMergeTree, combined with minimum runnable examples (MRE) and comparative queries,...

9/19/2024

ClickHouse CollapsingMergeTree & External Data Sources: H...

ClickHouse external data source engine guide: DDL templates, key parameters and read/write pipelines for ENGINE=HDFS, ENGINE=MySQL, ENGINE=Kafka, and distributed table configurations.

9/19/2024

big-dataclickhousehadoop

ClickHouse MergeTree Practical Guide: Partition, Sparse I...

ClickHouse MergeTree key mechanisms: batch writes form parts, background merge (Compact/Wide two part forms), ORDER BY is sparse primary index,...

ClickHouse MergeTree Deep Dive: Partition Pruning × Spars...

ClickHouse MergeTree storage and query path: column files (*.bin), sparse primary index (primary.idx), marker files (.mrk/.mrk2) and index_granularity...

Kafka Operations: Shell Commands & Java Client Examples

Covers Kafka daily operations: daemon startup, Shell topic management commands, and Java client programming (complete Producer/Consumer code) with key configuration parameters and ConsumerRebalance...

big-datakafkamessagingjavadata-engineering

Spring Boot Integration with Kafka

Detailed guide on integrating Kafka in Spring Boot projects, including dependency configuration, KafkaTemplate sync/async message sending, and complete @KafkaListener consumption practice.

big-datakafkaspring-bootjavamessaging

Spark Distributed Environment Setup

Step-by-step Apache Spark distributed computing environment setup, covering download and extract, environment variable configuration, slaves/spark-env.sh core config adjustments, and complete multi...

big-datasparkdata-engineering

ClickHouse Cluster Connectivity Self-Check & Data Types G...

Using three-node cluster (h121/122/123) as example, first complete cluster connectivity self-check: system.clusters validation → ON CLUSTER create...

9/14/2024

ClickHouse Table Engines: TinyLog/Log/StripeLog/Memory/Me...

Sort through ClickHouse table engines: TinyLog, Log, StripeLog, Memory, Merge principles, applicable scenarios and pitfalls, provide reproducible minimum...

9/14/2024