Kafka Components: Producer, Broker, Consumer Full Flow

This is article 53 in the Big Data series, deeply dissecting the working principles and collaboration flow of Kafka’s three core components.

Overall Architecture Review

Kafka is a distributed publish-subscribe messaging system with three core roles:

Producer (Producer)
    ↓ Publish messages
Broker Cluster (Message storage & routing)
    ↓ Consume messages
Consumer Group (Consumer group)

The three are decoupled through Topic + Partition: Producer only publishes, Consumer only subscribes, Broker handles middle persistence and routing.

Producer: Message Producer

Basic Responsibilities

Producer is responsible for creating messages and publishing to specified Topics. Sending process:

Serialize message (Key/Value)
Select target Partition based on partitioning strategy
Append message to local buffer (RecordAccumulator)
Background Sender thread batch sends to target Broker

Partitioning Strategy

Strategy	Trigger Condition	Effect
指定分区	Explicitly specify `partition` at send	Precise routing
Hash Partitioning	Message with Key	Same Key always writes to same Partition, guarantees local order
Round-Robin	No Key, default	Even distribution, maximizes throughput
Custom Partitioner	Implement `Partitioner` interface	Business custom routing logic

ACK Confirmation Mechanism

acks parameter controls message reliability vs throughput trade-off:

acks Value	Semantics	Use Cases
`0`	No waiting for any acknowledgment, fire-and-forget	Log collection, allows minor loss
`1`	Leader writes successfully	Default, balanced reliability and performance
`all` (`-1`)	All ISR replicas write	Finance, orders and core data

Async Send Example

ProducerRecord<Integer, String> record =
    new ProducerRecord<>("my_topic", 0, 42, "hello kafka");

// Async send, handle result via callback
producer.send(record, (metadata, exception) -> {
    if (exception == null) {
        System.out.println("Sent to partition " + metadata.partition()
            + " at offset " + metadata.offset());
    } else {
        exception.printStackTrace();
    }
});

Broker: Message Storage Node

Core Responsibilities

Broker is the basic unit in Kafka cluster. Each Broker is an independent JVM process, responsible for:

Receiving Producer written messages, persisting to local disk
Responding to Consumer pull requests
Maintaining Partition Leader/Follower roles
Coordinating cluster metadata with ZooKeeper (or KRaft)

Physical Storage of Partition

Each Partition corresponds to a directory on disk, composed of multiple Segment files:

/var/kafka-logs/my_topic-0/
├── 00000000000000000000.log     ← Message data file
├── 00000000000000000000.index   ← Offset sparse index
└── 00000000000000000000.timeindex ← Timestamp index

Filename is the 20-digit zero-padded starting Offset of that Segment
Write is append-only, read uses binary search index to locate Offset
Default new Segment generated when single Segment exceeds 1GB or 7 days

Leader and Follower

Partition Leader (on some Broker)
    ← Receives Producer writes
    → Responds to Consumer reads
    → Pushes to Followers for sync asynchronously

Partition Follower (on other Brokers)
    → Pulls messages from Leader, catches up
    → Enters ISR list
    → When Leader fails, elected from ISR as new Leader

Controller Role

There is exactly one Broker serving as Controller in the cluster, elected through ZooKeeper:

Monitor Broker存活状态
Responsible for Partition Leader election
Manage Partition ISR list changes
Coordinate Partition reallocation when new Broker joins

Consumer: Message Consumer

Consumer Group Mechanism

Kafka implements two consumption semantics through Consumer Group:

Within same Group:           Between different Groups:
  Partition 1 → Consumer A     Group A → Each consumes independently
  Partition 2 → Consumer B    Group B → Each consumes independently
  Partition 3 → Consumer C   (publish-subscribe semantics)
(point-to-point semantics)

Key rules:

Each Partition in same Consumer Group can only be consumed by one Consumer
When Consumer count exceeds Partition count, excess Consumers idle
Consumer exit/join triggers Rebalance, reassigning Partitions

Offset Management

Consumer tracks consumption progress through Offset:

Method	Description
Auto commit	`enable.auto.commit=true`, auto commits every `auto.commit.interval.ms`
Manual sync commit	`consumer.commitSync()`, ensure commit after successful consumption, avoid message loss
Manual async commit	`consumer.commitAsync()`, higher performance, no retry on failure

Offset is stored in Kafka internal Topic __consumer_offsets (Kafka 0.9+ no longer depends on ZooKeeper).

Consumption Position Reset

Control consumer behavior when first starting or Offset lost via auto.offset.reset:

earliest: Start from earliest message in Partition
latest (default): Only consume new messages after
none startup: Throw exception when no committed Offset exists

Rebalance Triggers

Rebalance triggers:
  1. Consumer joins Group (new instance comes online)
  2. Consumer leaves Group (crash or normal shutdown)
  3. Consumer timeout without heartbeat (session.timeout.ms)
  4. Topic Partition count changes

All consumers pause during Rebalance. Production should properly configure session.timeout.ms and max.poll.interval.ms to reduce unnecessary Rebalances.

Full Message Flow

1. Producer serializes message, selects Partition
2. Producer batch sends to target Broker (Leader)
3. Leader Broker writes to local log (.log file)
4. Follower actively pulls new messages from Leader, writes to local replica
5. After all ISR replicas confirm write, message "committed"
6. Consumer initiates Poll request to Leader, carrying current Offset
7. Leader returns message batch starting from that Offset
8. Consumer commits new Offset after processing

Summary

Producer partitioning strategy and ACK mechanism determine message routing and reliability
Broker Segment storage and ISR mechanism ensure high-performance persistence and high availability
Consumer Group achieves parallel consumption through Partition exclusive assignment, Offset management controls consumption semantics
The three collaborate to form Kafka’s complete闭环 of high throughput, low latency, and high availability