Kafka Architecture: High-Throughput Distributed Messaging

This is article 52 in the Big Data series, introducing Kafka’s core architecture and high-throughput principles.

What is Kafka

Kafka is a distributed, partitioned, multi-replica publish-subscribe messaging system developed by LinkedIn and contributed to Apache. It processes massive data with millisecond latency while ensuring data persistence, becoming one of the core message middleware in the big data ecosystem.

Core features:

Partitioning: Messages distributed across nodes, naturally supports horizontal scaling
Multi-replica strategy: ISR (In-Sync Replica) mechanism ensures reliability
Persistent storage: O(1) time complexity access, supports TB-level data
Zero-copy optimization: Combined with batch sending, significantly improves throughput

Core Design Goals

Efficiency

Kafka uses segmented log + index file storage model. Accessing any message has O(1) time complexity. Even with TB-level data accumulated in a Topic, latency remains at millisecond level.

High Throughput

Single machine can achieve 100,000+ messages/second, relying on three key technologies:

Technology	Principle
Batch Processing	Producer merges multiple messages into one batch, reducing network round trips
Sequential Write	Append-only log, sequential disk write performance approaches random memory write
Zero-Copy	Uses `sendfile` system call, data flows directly from page cache to NIC, skipping user-space copy

Partition Ordering

Messages within the same Partition are strictly ordered; cross-Partition order is not guaranteed. Two partitioning strategies supported:

Hash Partitioning: Route by message Key hash, same Key always goes to same Partition
Round-Robin Partitioning: Default when no Key, evenly distributes across all Partitions

Elastic Scaling

Add nodes online, trigger Partition rebalancing, the entire process does not interrupt service.

Message Model

Kafka supports two consumption modes, abstracted through Consumer Group:

Mode	Description	Use Cases
Queue (Point-to-Point)	Only one consumer in same Group processes each message	Task queues, load balancing
Topic (Publish-Subscribe)	Different Groups all receive the same message	Broadcast notifications, multi-system sync

Message pulling uses Pull mode: consumers pull actively, control consumption rate themselves, avoid being overwhelmed by pushes; cost is handling empty polls (optimized via fetch.max.wait.ms long polling).

Message Structure

Each Kafka message contains:

Field	Description
Key	Optional, used for partition routing and ordering
Value	Actual message body
Timestamp	Message creation or write time, used for sorting and expiration
Offset	Unique position identifier within Partition, monotonically increasing
Headers	Optional metadata key-value pairs

Core APIs

Producer API    → Publish message streams to Topic
Consumer API    → Subscribe and process records in Topic
Streams API     → Transform input streams to output streams (real-time compute)
Connector API   → Connect Kafka with external systems (DB, HDFS, etc.)

Architecture Components

Topic and Partition

Topic: Logical classification of messages, like a database table
Partition: Physical shard of Topic, each Partition is an ordered, immutable message log
Multiple Partitions of a Topic spread across different Brokers, enabling parallel read/write

Broker and Controller

Broker: Single service node in Kafka cluster, responsible for storing and forwarding messages
Controller: Automatically elected from active Brokers, responsible for Partition allocation and Broker monitoring
Each Partition has exactly one Leader Broker, handling all read/write requests; others are Followers, only for synchronization

ISR Mechanism

ISR (In-Sync Replica) is the set of replicas synchronized with the Leader:

Leader writes message
    ↓
Followers in ISR pull synchronously
    ↓
All ISR replicas acknowledge, message considered "committed" (when acks=all)

Follower lagging beyond replica.lag.time.max.ms is removed from ISR
When Leader fails, only replicas in ISR are eligible to be elected as new Leader

Recommended Message Format: Apache Avro

Compared to JSON/Protobuf, Apache Avro has significant advantages in big data scenarios:

Compact binary encoding, saves 50%+ storage space
Schema evolution: Forward/backward compatible, field add/remove doesn’t break consumers
Schema Registry: Centralized schema version management, Producer/Consumer auto-negotiation

Typical Use Cases

Scenario	Description
Log Aggregation	Collect Nginx, application logs, write to HDFS/ES
Message Middleware	Replace traditional MQ, support higher throughput and longer retention
User Behavior Tracking	Real-time collection of tracking data, power recommendation systems
Operations Monitoring	Metrics data flow, integrate with Prometheus/Grafana
Stream Computing	Integrate with Spark Streaming / Flink, build real-time data pipelines

Performance Metrics

Single node supports thousands of concurrent client connections
7×24 continuous operation
Millisecond-level end-to-end processing latency
TB-level message storage without performance degradation
Multi-replica zero data loss guarantee

Summary

Kafka’s high throughput comes from sequential writes, zero-copy, and batch processing; high availability comes from Partition multi-replica and ISR mechanism; elastic scaling comes from Partition’s horizontal sharding design. Understanding these three dimensions is the foundation for deep learning Kafka operations and tuning.