This is article 63 in the Big Data series, deeply analyzing Kafka replica mechanism, ISR principles and Leader election strategy. Understanding these mechanisms is key to building high-availability Kafka clusters.
Replica Mechanism Overview
Kafka achieves high availability and data persistence by storing multiple copies (Replicas) of partitions across multiple Brokers. Each partition has one Leader replica and several Follower replicas:
- Leader replica: Handles all client read/write requests, main entry point for data
- Follower replica: Does not serve external requests, only syncs data from Leader, participates in election when Leader crashes
Replicas distributed across different Brokers, single Broker crash doesn’t cause data loss.
ISR (In-Sync Replicas) Sync Replica Set
ISR is the core concept of Kafka’s high availability mechanism, referring to the set of replicas synchronized with Leader.
ISR Maintenance Rules
Follower replica must satisfy both conditions to stay in ISR:
- Session alive: Follower must maintain active session with ZooKeeper (Kafka 2.8+ uses KRaft, changed to communicate with Controller)
- Sync not lagging: Follower’s data sync lag doesn’t exceed threshold, controlled by
replica.lag.time.max.ms(default 30 seconds)
# Max time replica allowed to lag behind Leader (removed from ISR if exceeded)
replica.lag.time.max.ms=30000
When Follower sync speed can’t keep up with Leader’s message write speed, it’s removed from ISR, entering OSR (Out-of-Sync Replicas) list. When Follower catches up to Leader, it rejoins ISR.
Importance of ISR
Only after all replicas in ISR confirm receiving message (when acks=all) is the message considered “committed”. This guarantees:
“As long as at least one synchronized replica is alive, committed messages will not be lost.”
Failure Recovery Scenarios
Scenario 1: Partial Replica Failure
When Follower replica’s Broker crashes and recovers:
- Follower restarts, discovers data lags behind Leader
- Follower initiates data pull request to Leader, syncs from checkpoint
- After catching up to Leader’s HW (High Watermark), rejoins ISR
Scenario 2: Leader Replica Failure
When Leader’s Broker crashes:
- Controller (cluster controller) detects Leader crash
- Elects new Leader from current ISR list
- Other Followers start syncing from new Leader
- If ISR is empty, triggers unclean leader election
Leader Election Mechanism
Normal Leader Election
Leader election coordinated by Kafka Controller, preferentially elects from ISR:
ISR = [Broker1, Broker2, Broker3]
When Broker1 (Leader) crashes:
New Leader selected from {Broker2, Broker3} → Prefer Broker2 (first in ISR)
Controller itself has high availability guarantee - in ZooKeeper mode, multiple Brokers compete to register ephemeral node in ZooKeeper, winner becomes Controller.
Unclean Leader Election
When ISR is empty (all synchronized replicas crashed), Kafka faces dilemma:
| Strategy | Configuration | Characteristics |
|---|---|---|
| Wait for ISR recovery | unclean.leader.election.enable=false (default) | Strong consistency, may be unavailable for long time |
| Elect from OSR | unclean.leader.election.enable=true | High availability, but may lose data |
Production recommendation: For scenarios requiring extremely high data consistency like finance/orders, keep default value false; for scenarios where availability is more important than consistency (like log collection), can set to true.
ISR vs Majority Voting Comparison
Kafka’s ISR mechanism differs from majority voting used by Raft, ZAB:
| Feature | ISR Mechanism | Majority Voting |
|---|---|---|
| Replicas needed to tolerate f failures | f+1 | 2f+1 |
| Max failures tolerable with 3 replicas | 2 | 1 |
| Write latency | Wait for all ISR | Wait for majority |
| Performance | Higher (fewer replicas) | Relatively lower |
ISR advantage: With fewer replicas, can tolerate more node failures than majority voting while maintaining similar throughput.
HW and LEO
Understanding Kafka replica sync requires mastering two key concepts:
- LEO (Log End Offset): Latest message offset + 1 currently written by each replica
- HW (High Watermark): Maximum offset visible to consumers, equals minimum LEO of all replicas in ISR
LEO of replicas in ISR:
Leader: LEO=100
Follower1: LEO=98
Follower2: LEO=97
HW = min(100, 98, 97) = 97
Consumers can only consume up to offset=96
HW mechanism ensures consumers can only read messages confirmed by all ISR replicas, guaranteeing no data loss after Leader switch.