Blog

Apache Kylin Cube Practice: From Modeling to Build and Qu...

Apache Kylin (3.x/4.x) Cube setup and optimization: complete flow from DataSource → Model → Cube, covering dimension modeling, measure design, Cuboid...

10/9/2024

big-datasparkdistributed-systemdata-engineeringstream-processing

From MapReduce to Spark: Big Data Computing Evolution

Systematic overview of big data processing engine evolution from MapReduce to Spark to Flink, analyzing Spark in-memory computing model, unified ecosystem and core components.

10/9/2024

Apache Kylin Comprehensive Guide: MOLAP Architecture, Hiv...

Background, evolution and engineering practice of Apache Kylin, focusing on MOLAP solution implementation path for massive data analysis. Core keywords: Apache...

10/8/2024

Apache Kylin 3.1.1 Deployment on Hadoop 2.9/Hive 2.3/HBas...

Complete deployment record of Apache Kylin 3.1.1 on Hadoop 2.9.2, Hive 2.3.9, HBase 1.3.1, Spark 2.4.5 (without-hadoop, Scala 2.12) and three-node...

10/8/2024

big-datakafkamessagingdata-engineering

Kafka Storage Mechanism: Log Segmentation & Retention

Deep analysis of Kafka log storage architecture, including LogSegment design, sparse offset index and timestamp index principles, message lookup flow, and log retention and cleanup strategy configu...

10/5/2024

Kafka High Performance: Zero-Copy, mmap & Sequential Write

Deep dive into Kafka's three I/O technologies achieving high throughput: sendfile zero-copy, mmap memory mapping and page cache sequential write, revealing kernel-level optimization behind million ...

10/5/2024

Kafka Replica Mechanism: ISR & Leader Election

Deep dive into Kafka replica mechanism, including ISR sync node set maintenance, Leader election process, and unclean election trade-offs between consistency and availability.

10/2/2024

Kafka Exactly-Once: Idempotence & Transactions

Systematic explanation of how Kafka achieves Exactly-Once semantics through idempotent producers and transactions, covering PID/sequence number principle, cross-partition transaction configuration ...

10/2/2024

Apache Druid Storage & Query Architecture: Segment/Chunk/...

Apache Druid data storage and high-performance query path: from DataSource/Chunk/Segment layering, to columnar storage, Roll-up pre-aggregation, Bitmap...

9/30/2024

Apache Druid + Kafka Real-time Analysis: JSON Flattening ...

Scala Kafka Producer writes order/click data to Kafka Topic (example topic: druid2), continuous ingestion in Druid through Kafka Indexing Service. Since...

9/30/2024

big-datadruidkafka

Apache Druid Real-time Kafka Ingestion: Complete Practice...

Complete practice of Apache Druid real-time Kafka ingestion, using network traffic JSON as example, completing data ingestion through Druid console's Streaming/Kafka wizard, parsing time column, se...

9/29/2024

big-datadruidkafka

Apache Druid Architecture & Component Responsibilities: C...

Apache Druid component responsibilities and deployment points from 0.13.0 to current (2025): Coordinator manages Historical node Segment...

9/29/2024

Apache Druid Cluster Deployment [Part 1]: MySQL Metadata ...

Apache Druid 30.0.0 deployable solution covering MySQL metadata storage (mysql-connector-java 8.0.19), HDFS deep storage and HDFS indexing-logs, plus Kafka...

Apache Druid Cluster Mode [Part 2]: Low-Memory Cluster Pr...

Low-memory cluster practice for Apache Druid 30.0.0 on three nodes: provides JVM parameters and runtime.properties key items for Broker/Historical/Router, explains off-heap memory and processing bu...

Kafka Topic, Partition & Consumer: Rebalance Optimization

Deep dive into Kafka Topic, Partition, Consumer Group core mechanisms, covering custom deserialization, offset management and rebalance optimization configuration.

Kafka Topic Management: Commands & Java API

Comprehensive introduction to Kafka Topic operations, including kafka-topics.sh commands, replica assignment strategy principles, and KafkaAdminClient Java API core usage.