Tag: Kafka
32 articles
MQ Application: Cache Warm-up, Rate Limiting, Redis Lua and Traffic Peak Shaving
E-commerce seckill/ticket-grabbing scenarios with instantaneous traffic peaks, high read/write concurrency. Use pre-static + rate limiting queuing
MQ Selection: RabbitMQ vs RocketMQ vs Kafka
Coexisting with traditional IBM MQ, need open source, operatable, scalable, consistency/reliability.
Big Data 268 - Real-time Warehouse ODS Layer: Writing Kafka Dimension Tables into DIM
Kafka is a distributed streaming platform for high-throughput message passing. In ETL processes, Kafka serves as a data message queue or stream processing source.
Big Data 269 - Real-time Warehouse DIM, DW and ADS: Scala Pipelines to HBase
Original MySQL area table to HBase: Convert area table to region ID, region name, city ID, city name, province ID, province name, and write to HBase.
Big Data #266: Canal Integration with Kafka - Real-time Data Sync
This article introduces Alibaba's open-source Canal tool, which implements Change Data Capture (CDC) by parsing MySQL binlog.
Big Data 267 - Real-Time Warehouse ODS: Lambda and Kappa Architecture
In internet companies, common ODS data includes business log data (Log) and business DB data.
Big Data 261 - Real-Time Warehouse Business Table Structure
Realtime data warehouse is a data warehouse system that differs from traditional batch processing data warehouses by emphasizing low latency, high throughput.
Big Data 89 - Spark Streaming with Kafka: Receiver vs Direct Mode
This is article 89 in the Big Data series, deeply comparing two core modes of Spark Streaming integration with Kafka, focusing on Direct mode production practices.
Spark Streaming Data Sources: File Stream, Socket, RDD RDD Queue
Comprehensive explanation of three Spark Streaming basic data sources: file stream directory monitoring, Socket TCP ingestion, RDD queue stream for testing simulation.
Big Data 189 - Nginx JSON Logs to ELK: ZK + Kafka + Elasticsearch 7.3.0 + Kibana 7.3.0
Configure Nginx logformat json to output structured accesslog (containing @timestamp, requesttime, status, requesturi, ua and other fields).
Filebeat → Kafka → Logstash → Elasticsearch Practice
Filebeat collects Nginx access.log to Kafka, and Logstash consumes, parses embedded JSON by field conditions, enriches metadata, and writes structured logs to Elasticsear...
Apache Kylin 1.6 Streaming Cubing Practice: Kafka to Minute-level OLAP
Kafka→Kylin real-time OLAP pipeline, providing minute-level aggregation queries for common 2025 business scenarios (e-commerce transactions, user behavior...
Kafka Storage Mechanism: Log Segmentation & Retention
This is article 65 in the Big Data series, deeply analyzing Kafka's log storage mechanism.
Kafka High Performance: Zero-Copy, mmap & Sequential Write
This is article 66 in the Big Data series, deeply analyzing Kafka's underlying I/O optimization technologies achieving extremely high throughput.
Kafka Replica Mechanism: ISR & Leader Election
Deep dive into Kafka replica mechanism, including ISR sync node set maintenance, Leader election process, and unclean election trade-offs between consistency and availabi...
Kafka Exactly-Once: Idempotence & Transactions
Systematic explanation of how Kafka achieves Exactly-Once semantics through idempotent producers and transactions, covering PID/sequence number principle...
Big Data 156 - Apache Druid + Kafka Real-time Analysis: JSON Flattening, Ingestion & SQL Metrics
Scala Kafka Producer writes order/click data to Kafka Topic (example topic: druid2), continuous ingestion in Druid through Kafka Indexing Service.
Big Data 153 - Apache Druid Real-time Kafka Ingestion: Complete Practice from Ingestion to Query
Complete practice of Apache Druid real-time Kafka ingestion, using network traffic JSON as example, completing data ingestion through Druid console's Streaming/Kafka wiza...
Kafka Topic, Partition & Consumer: Rebalance Optimization
Deep dive into Kafka Topic, Partition, Consumer Group core mechanisms, covering custom deserialization, offset management and rebalance optimization configuration.
Kafka Topic Management: Commands & Java API
Comprehensive introduction to Kafka Topic operations, including kafka-topics.sh commands, replica assignment strategy principles, and KafkaAdminClient Java API core usage.
Kafka Producer Interceptor & Interceptor Chain
Introduction to Kafka 0.10 Producer interceptor mechanism, covering onSend and onAcknowledgement interception points, interceptor chain execution order and error isolatio...
Big Data 60 - Kafka Consumer: Consumption Flow, Heartbeat and Parameter Tuning
Detailed explanation of Kafka Consumer Group consumption model, partition assignment strategy, heartbeat keep-alive mechanism, and tuning practices for key parameters lik...
Kafka Producer Message Sending Flow & Core Parameters
Deep analysis of Kafka Producer initialization, message interception, serialization, partition routing, buffer batch sending, ACK confirmation and complete sending chain.
Kafka Serialization & Partitioning: Custom Implementation
Deep dive into Kafka message serialization and partition routing, including complete code for custom Serializer and Partitioner, mastering precise message routing and eff...
Kafka Operations: Shell Commands & Java Client Examples
Covers Kafka daily operations: daemon startup, Shell topic management commands, and Java client programming (complete Producer/Consumer code) with key configuration param...
Spring Boot Integration with Kafka
Detailed guide on integrating Kafka in Spring Boot projects, including dependency configuration, KafkaTemplate sync/async message sending, and complete @KafkaListener con...
Kafka Components: Producer, Broker, Consumer Full Flow
Deep dive into Kafka's three core components: Producer partitioning strategy and ACK mechanism, Broker Leader/Follower architecture, Consumer Group partition assignment a...
Kafka Installation: From ZooKeeper to KRaft Evolution
Introduction to Kafka 2.x vs 3.x core differences, detailed cluster installation steps, ZooKeeper configuration, Broker parameter settings, and how KRaft mode replaces Zo...
Kafka Architecture: High-Throughput Distributed Messaging
Systematic introduction to Kafka core architecture: Topic/Partition/Replica model, ISR mechanism, zero-copy optimization, message format and typical use cases.
Spark Streaming Kafka Consumption: Offset Acquisition, Storage and Management
When Spark Streaming integrates with Kafka, Offset management is key to ensuring data processing continuity and consistency.
Big Data 104 - Spark Streaming with Kafka: Offset Management Mechanisms & Best Practices
Offset is used to mark message position in Kafka partition. Proper management can achieve at-least-once or even exactly-once data processing semantics.
Big Data 102 - Spark Streaming with Kafka: Receiver and Direct Approaches
This article introduces two Spark Streaming integration methods with Kafka: Receiver Approach and Direct Approach.