Tag: Kafka

32 articles

MQ Application: Cache Warm-up, Rate Limiting, Redis Lua and Traffic Peak Shaving

E-commerce seckill/ticket-grabbing scenarios with instantaneous traffic peaks, high read/write concurrency. Use pre-static + rate limiting queuing

MQ Selection: RabbitMQ vs RocketMQ vs Kafka

Coexisting with traditional IBM MQ, need open source, operatable, scalable, consistency/reliability.

Big Data 268 - Real-time Warehouse ODS Layer: Writing Kafka Dimension Tables into DIM

Kafka is a distributed streaming platform for high-throughput message passing. In ETL processes, Kafka serves as a data message queue or stream processing source.

Big Data 269 - Real-time Warehouse DIM, DW and ADS: Scala Pipelines to HBase

Original MySQL area table to HBase: Convert area table to region ID, region name, city ID, city name, province ID, province name, and write to HBase.

Big Data #266: Canal Integration with Kafka - Real-time Data Sync

This article introduces Alibaba's open-source Canal tool, which implements Change Data Capture (CDC) by parsing MySQL binlog.

Big Data 267 - Real-Time Warehouse ODS: Lambda and Kappa Architecture

In internet companies, common ODS data includes business log data (Log) and business DB data.

Big Data 261 - Real-Time Warehouse Business Table Structure

Realtime data warehouse is a data warehouse system that differs from traditional batch processing data warehouses by emphasizing low latency, high throughput.

Big Data 89 - Spark Streaming with Kafka: Receiver vs Direct Mode

This is article 89 in the Big Data series, deeply comparing two core modes of Spark Streaming integration with Kafka, focusing on Direct mode production practices.

Spark Streaming Data Sources: File Stream, Socket, RDD RDD Queue

Comprehensive explanation of three Spark Streaming basic data sources: file stream directory monitoring, Socket TCP ingestion, RDD queue stream for testing simulation.

Big Data 189 - Nginx JSON Logs to ELK: ZK + Kafka + Elasticsearch 7.3.0 + Kibana 7.3.0

Configure Nginx logformat json to output structured accesslog (containing @timestamp, requesttime, status, requesturi, ua and other fields).

Filebeat → Kafka → Logstash → Elasticsearch Practice

Filebeat collects Nginx access.log to Kafka, and Logstash consumes, parses embedded JSON by field conditions, enriches metadata, and writes structured logs to Elasticsear...

Apache Kylin 1.6 Streaming Cubing Practice: Kafka to Minute-level OLAP

Kafka→Kylin real-time OLAP pipeline, providing minute-level aggregation queries for common 2025 business scenarios (e-commerce transactions, user behavior...

Kafka Storage Mechanism: Log Segmentation & Retention

This is article 65 in the Big Data series, deeply analyzing Kafka's log storage mechanism.

Kafka High Performance: Zero-Copy, mmap & Sequential Write

This is article 66 in the Big Data series, deeply analyzing Kafka's underlying I/O optimization technologies achieving extremely high throughput.

Kafka Replica Mechanism: ISR & Leader Election

Deep dive into Kafka replica mechanism, including ISR sync node set maintenance, Leader election process, and unclean election trade-offs between consistency and availabi...

Kafka Exactly-Once: Idempotence & Transactions

Systematic explanation of how Kafka achieves Exactly-Once semantics through idempotent producers and transactions, covering PID/sequence number principle...

Big Data 156 - Apache Druid + Kafka Real-time Analysis: JSON Flattening, Ingestion & SQL Metrics

Scala Kafka Producer writes order/click data to Kafka Topic (example topic: druid2), continuous ingestion in Druid through Kafka Indexing Service.

Big Data 153 - Apache Druid Real-time Kafka Ingestion: Complete Practice from Ingestion to Query

Complete practice of Apache Druid real-time Kafka ingestion, using network traffic JSON as example, completing data ingestion through Druid console's Streaming/Kafka wiza...

Kafka Topic, Partition & Consumer: Rebalance Optimization

Deep dive into Kafka Topic, Partition, Consumer Group core mechanisms, covering custom deserialization, offset management and rebalance optimization configuration.

Kafka Topic Management: Commands & Java API

Comprehensive introduction to Kafka Topic operations, including kafka-topics.sh commands, replica assignment strategy principles, and KafkaAdminClient Java API core usage.

Kafka Producer Interceptor & Interceptor Chain

Introduction to Kafka 0.10 Producer interceptor mechanism, covering onSend and onAcknowledgement interception points, interceptor chain execution order and error isolatio...

Big Data 60 - Kafka Consumer: Consumption Flow, Heartbeat and Parameter Tuning

Detailed explanation of Kafka Consumer Group consumption model, partition assignment strategy, heartbeat keep-alive mechanism, and tuning practices for key parameters lik...

Kafka Producer Message Sending Flow & Core Parameters

Deep analysis of Kafka Producer initialization, message interception, serialization, partition routing, buffer batch sending, ACK confirmation and complete sending chain.

Kafka Serialization & Partitioning: Custom Implementation

Deep dive into Kafka message serialization and partition routing, including complete code for custom Serializer and Partitioner, mastering precise message routing and eff...

Kafka Operations: Shell Commands & Java Client Examples

Covers Kafka daily operations: daemon startup, Shell topic management commands, and Java client programming (complete Producer/Consumer code) with key configuration param...

Spring Boot Integration with Kafka

Detailed guide on integrating Kafka in Spring Boot projects, including dependency configuration, KafkaTemplate sync/async message sending, and complete @KafkaListener con...

Kafka Components: Producer, Broker, Consumer Full Flow

Deep dive into Kafka's three core components: Producer partitioning strategy and ACK mechanism, Broker Leader/Follower architecture, Consumer Group partition assignment a...

Kafka Installation: From ZooKeeper to KRaft Evolution

Introduction to Kafka 2.x vs 3.x core differences, detailed cluster installation steps, ZooKeeper configuration, Broker parameter settings, and how KRaft mode replaces Zo...

Kafka Architecture: High-Throughput Distributed Messaging

Systematic introduction to Kafka core architecture: Topic/Partition/Replica model, ISR mechanism, zero-copy optimization, message format and typical use cases.

Spark Streaming Kafka Consumption: Offset Acquisition, Storage and Management

When Spark Streaming integrates with Kafka, Offset management is key to ensuring data processing continuity and consistency.

Big Data 104 - Spark Streaming with Kafka: Offset Management Mechanisms & Best Practices

Offset is used to mark message position in Kafka partition. Proper management can achieve at-least-once or even exactly-once data processing semantics.

Big Data 102 - Spark Streaming with Kafka: Receiver and Direct Approaches

This article introduces two Spark Streaming integration methods with Kafka: Receiver Approach and Direct Approach.