Tag: kafka

32 articles

MQ Application: Cache Warm-up, Rate Limiting, Redis Lua, ...

E-commerce seckill/ticket-grabbing scenarios with instantaneous traffic peaks, high read/write concurrency. Use pre-static + rate limiting queuing; write path uses Redis Lua atomic pre-deduction + ...

MQ Selection: RabbitMQ vs RocketMQ vs Kafka

Coexisting with traditional IBM MQ, need open source, operatable, scalable, consistency/reliability. RabbitMQ suits 'reliability-first business decoupling', RocketMQ suits 'transaction/order/delay ...

Big Data #268: Real-time Warehouse ODS Layer - Writing Ka...

Writing dimension tables (DIM) from Kafka typically involves reading real-time or batch data from Kafka topics and updating dimension tables based on the data...

Big Data #269: Real-time Warehouse DIM, DW and ADS Layer ...

DW (Data Warehouse layer) is built from DWD, DWS, and DIM layer data, completing data architecture and integration, establishing consistent dimensions, and...

Big Data #266: Canal Integration with Kafka - Real-time D...

This article introduces Alibaba's open-source Canal tool, which implements Change Data Capture (CDC) by parsing MySQL binlog. Demonstrates how to integrate...

Realtime Warehouse - ODS Lambda Architecture Kappa Archit...

In internet companies, common ODS data includes business log data (Log) and business DB data. For business DB data, collecting data from relational databases...

Realtime Warehouse - Business Database Table Structure: T...

Realtime data warehouse is a data warehouse system that differs from traditional batch processing data warehouses by emphasizing low latency, high throughput,...

Spark Streaming Integration with Kafka: Receiver and Dire...

Detailed explanation of two Spark Streaming integration modes with Kafka: Receiver-based high-level API vs Direct mode architecture differences, offset management, Exactly-Once semantics guarantee,...

Spark Streaming Data Sources: File Stream, Socket, RDD Qu...

Comprehensive explanation of three Spark Streaming basic data sources: file stream directory monitoring, Socket TCP ingestion, RDD queue stream for testing simulation, with complete Scala code exam...

Nginx JSON Logs to ELK: ZK+Kafka+Elasticsearch 7.3.0+Kiba...

Configure Nginx log_format json to output structured access_log (containing @timestamp, request_time, status, request_uri, ua and other fields), start...

Filebeat → Kafka → Logstash → Elasticsearch Practice

Filebeat collects Nginx access.log and writes to Kafka, Logstash consumes from Kafka and parses message embedded JSON by field (app/type) conditions, adds...

Apache Kylin 1.6 Streaming Cubing Practice: Kafka to Minu...

Kafka→Kylin real-time OLAP pipeline, providing minute-level aggregation queries for common 2025 business scenarios (e-commerce transactions, user behavior, IoT monitoring).

Kafka Storage Mechanism: Log Segmentation & Retention

Deep analysis of Kafka log storage architecture, including LogSegment design, sparse offset index and timestamp index principles, message lookup flow, and log retention and cleanup strategy configu...

Kafka High Performance: Zero-Copy, mmap & Sequential Write

Deep dive into Kafka's three I/O technologies achieving high throughput: sendfile zero-copy, mmap memory mapping and page cache sequential write, revealing kernel-level optimization behind million ...

Kafka Replica Mechanism: ISR & Leader Election

Deep dive into Kafka replica mechanism, including ISR sync node set maintenance, Leader election process, and unclean election trade-offs between consistency and availability.

Kafka Exactly-Once: Idempotence & Transactions

Systematic explanation of how Kafka achieves Exactly-Once semantics through idempotent producers and transactions, covering PID/sequence number principle, cross-partition transaction configuration ...

Apache Druid + Kafka Real-time Analysis: JSON Flattening ...

Scala Kafka Producer writes order/click data to Kafka Topic (example topic: druid2), continuous ingestion in Druid through Kafka Indexing Service. Since...

Apache Druid Real-time Kafka Ingestion: Complete Practice...

Complete practice of Apache Druid real-time Kafka ingestion, using network traffic JSON as example, completing data ingestion through Druid console's Streaming/Kafka wizard, parsing time column, se...

Kafka Topic, Partition & Consumer: Rebalance Optimization

Deep dive into Kafka Topic, Partition, Consumer Group core mechanisms, covering custom deserialization, offset management and rebalance optimization configuration.

Kafka Topic Management: Commands & Java API

Comprehensive introduction to Kafka Topic operations, including kafka-topics.sh commands, replica assignment strategy principles, and KafkaAdminClient Java API core usage.

Kafka Producer Interceptor & Interceptor Chain

Introduction to Kafka 0.10 Producer interceptor mechanism, covering onSend and onAcknowledgement interception points, interceptor chain execution order and error isolation, with complete custom int...

Kafka Consumer: Consumption Flow, Heartbeat & Parameter T...

Detailed explanation of Kafka Consumer Group consumption model, partition assignment strategy, heartbeat keep-alive mechanism, and tuning practices for key parameters like session.timeout.ms, heart...

Kafka Producer Message Sending Flow & Core Parameters

Deep analysis of Kafka Producer initialization, message interception, serialization, partition routing, buffer batch sending, ACK confirmation and complete sending chain, with key parameter tuning ...

Kafka Serialization & Partitioning: Custom Implementation

Deep dive into Kafka message serialization and partition routing, including complete code for custom Serializer and Partitioner, mastering precise message routing and efficient transmission.

Kafka Operations: Shell Commands & Java Client Examples

Covers Kafka daily operations: daemon startup, Shell topic management commands, and Java client programming (complete Producer/Consumer code) with key configuration parameters and ConsumerRebalance...

Spring Boot Integration with Kafka

Detailed guide on integrating Kafka in Spring Boot projects, including dependency configuration, KafkaTemplate sync/async message sending, and complete @KafkaListener consumption practice.

Kafka Components: Producer, Broker, Consumer Full Flow

Deep dive into Kafka's three core components: Producer partitioning strategy and ACK mechanism, Broker Leader/Follower architecture, Consumer Group partition assignment and offset management.

Kafka Installation: From ZooKeeper to KRaft Evolution

Introduction to Kafka 2.x vs 3.x core differences, detailed cluster installation steps, ZooKeeper configuration, Broker parameter settings, and how KRaft mode replaces ZooKeeper dependency.

Kafka Architecture: High-Throughput Distributed Messaging

Systematic introduction to Kafka core architecture: Topic/Partition/Replica model, ISR mechanism, zero-copy optimization, message format and typical use cases.

Spark Streaming Kafka Consumption: Offset Acquisition, St...

When Spark Streaming integrates with Kafka, Offset management is key to ensuring data processing continuity and consistency. Offset marks message position in...

Spark Streaming Integration with Kafka: Offset Management...

Offset is used to mark message position in Kafka partition. Proper management can achieve at-least-once or even exactly-once data processing semantics. By persisting Offset, application can resume ...

Spark Streaming Integration with Kafka: Receiver and Dire...

This article introduces two Spark Streaming integration methods with Kafka: Receiver Approach and Direct Approach. Receiver uses Executor-based Receiver to...