Gleam Lab · Tag Archive

Tag: kafka

32 articles collected by topic for tutorials, cases, engineering practice, and research notes.

MQ Application: Cache Warm-up, Rate Limiting, Redis Lua and Traffic Peak Shaving

E-commerce seckill/ticket-grabbing scenarios with instantaneous traffic peaks, high read/write concurrency. Use pre-static + rate limiting queuing

12/15/2025

MQ Selection: RabbitMQ vs RocketMQ vs Kafka

Coexisting with traditional IBM MQ, need open source, operatable, scalable, consistency/reliability.

12/14/2025

Big Data 268 - Real-time Warehouse ODS Layer: Writing Kafka Dimension Tables into DIM

Kafka is a distributed streaming platform for high-throughput message passing. In ETL processes, Kafka serves as a data message queue or stream processing source.

1/3/2025

Big Data 269 - Real-time Warehouse DIM, DW and ADS: Scala Pipelines to HBase

Original MySQL area table to HBase: Convert area table to region ID, region name, city ID, city name, province ID, province name, and write to HBase.

1/3/2025

Big Data #266: Canal Integration with Kafka - Real-time Data Sync

This article introduces Alibaba's open-source Canal tool, which implements Change Data Capture (CDC) by parsing MySQL binlog.

1/2/2025

Big Data 267 - Real-Time Warehouse ODS: Lambda and Kappa Architecture

In internet companies, common ODS data includes business log data (Log) and business DB data.

1/2/2025

Big Data 261 - Real-Time Warehouse Business Table Structure

Realtime data warehouse is a data warehouse system that differs from traditional batch processing data warehouses by emphasizing low latency, high throughput.

12/28/2024

Big Data 89 - Spark Streaming with Kafka: Receiver vs Direct Mode

This is article 89 in the Big Data series, deeply comparing two core modes of Spark Streaming integration with Kafka, focusing on Direct mode production practices.

11/20/2024

Spark Streaming Data Sources: File Stream, Socket, RDD RDD Queue

Comprehensive explanation of three Spark Streaming basic data sources: file stream directory monitoring, Socket TCP ingestion, RDD queue stream for testing simulation.

11/13/2024

Big Data 189 - Nginx JSON Logs to ELK: ZK + Kafka + Elasticsearch 7.3.0 + Kibana 7.3.0

Configure Nginx logformat json to output structured accesslog (containing @timestamp, requesttime, status, requesturi, ua and other fields).

10/25/2024

Filebeat → Kafka → Logstash → Elasticsearch Practice

Filebeat collects Nginx access.log to Kafka, and Logstash consumes, parses embedded JSON by field conditions, enriches metadata, and writes structured logs to Elasticsear...

10/25/2024

Apache Kylin 1.6 Streaming Cubing Practice: Kafka to Minute-level OLAP

Kafka→Kylin real-time OLAP pipeline, providing minute-level aggregation queries for common 2025 business scenarios (e-commerce transactions, user behavior...

10/12/2024

Kafka Storage Mechanism: Log Segmentation & Retention

This is article 65 in the Big Data series, deeply analyzing Kafka's log storage mechanism.

10/5/2024

Kafka High Performance: Zero-Copy, mmap & Sequential Write

This is article 66 in the Big Data series, deeply analyzing Kafka's underlying I/O optimization technologies achieving extremely high throughput.

10/5/2024

Kafka Replica Mechanism: ISR & Leader Election

Deep dive into Kafka replica mechanism, including ISR sync node set maintenance, Leader election process, and unclean election trade-offs between consistency and availabi...

10/2/2024

Kafka Exactly-Once: Idempotence & Transactions

Systematic explanation of how Kafka achieves Exactly-Once semantics through idempotent producers and transactions, covering PID/sequence number principle...

10/2/2024

Big Data 156 - Apache Druid + Kafka Real-time Analysis: JSON Flattening, Ingestion & SQL Metrics

Scala Kafka Producer writes order/click data to Kafka Topic (example topic: druid2), continuous ingestion in Druid through Kafka Indexing Service.

9/30/2024

Big Data 153 - Apache Druid Real-time Kafka Ingestion: Complete Practice from Ingestion to Query

Complete practice of Apache Druid real-time Kafka ingestion, using network traffic JSON as example, completing data ingestion through Druid console's Streaming/Kafka wiza...

9/29/2024

Kafka Topic, Partition & Consumer: Rebalance Optimization

Deep dive into Kafka Topic, Partition, Consumer Group core mechanisms, covering custom deserialization, offset management and rebalance optimization configuration.

9/28/2024

Kafka Topic Management: Commands & Java API

Comprehensive introduction to Kafka Topic operations, including kafka-topics.sh commands, replica assignment strategy principles, and KafkaAdminClient Java API core usage.

9/28/2024

Kafka Producer Interceptor & Interceptor Chain

Introduction to Kafka 0.10 Producer interceptor mechanism, covering onSend and onAcknowledgement interception points, interceptor chain execution order and error isolatio...

9/25/2024

Big Data 60 - Kafka Consumer: Consumption Flow, Heartbeat and Parameter Tuning

Detailed explanation of Kafka Consumer Group consumption model, partition assignment strategy, heartbeat keep-alive mechanism, and tuning practices for key parameters lik...

9/25/2024

Kafka Producer Message Sending Flow & Core Parameters

Deep analysis of Kafka Producer initialization, message interception, serialization, partition routing, buffer batch sending, ACK confirmation and complete sending chain.

9/21/2024

Kafka Serialization & Partitioning: Custom Implementation

Deep dive into Kafka message serialization and partition routing, including complete code for custom Serializer and Partitioner, mastering precise message routing and eff...

9/21/2024

Kafka Operations: Shell Commands & Java Client Examples

Covers Kafka daily operations: daemon startup, Shell topic management commands, and Java client programming (complete Producer/Consumer code) with key configuration param...

9/18/2024

Spring Boot Integration with Kafka

Detailed guide on integrating Kafka in Spring Boot projects, including dependency configuration, KafkaTemplate sync/async message sending, and complete @KafkaListener con...

9/18/2024

Kafka Components: Producer, Broker, Consumer Full Flow

Deep dive into Kafka's three core components: Producer partitioning strategy and ACK mechanism, Broker Leader/Follower architecture, Consumer Group partition assignment a...

9/14/2024

Kafka Installation: From ZooKeeper to KRaft Evolution

Introduction to Kafka 2.x vs 3.x core differences, detailed cluster installation steps, ZooKeeper configuration, Broker parameter settings, and how KRaft mode replaces Zo...

9/14/2024

Kafka Architecture: High-Throughput Distributed Messaging

Systematic introduction to Kafka core architecture: Topic/Partition/Replica model, ISR mechanism, zero-copy optimization, message format and typical use cases.

9/11/2024

Spark Streaming Kafka Consumption: Offset Acquisition, Storage and Management

When Spark Streaming integrates with Kafka, Offset management is key to ensuring data processing continuity and consistency.

8/27/2024

Big Data 104 - Spark Streaming with Kafka: Offset Management Mechanisms & Best Practices

Offset is used to mark message position in Kafka partition. Proper management can achieve at-least-once or even exactly-once data processing semantics.

8/27/2024

Big Data 102 - Spark Streaming with Kafka: Receiver and Direct Approaches

This article introduces two Spark Streaming integration methods with Kafka: Receiver Approach and Direct Approach.

8/26/2024