This is article 63 in the Big Data series, deeply analyzing Kafka replica mechanism, ISR principles and Leader election strategy. Understanding these mechanisms is key to building high-availability Kafka clusters.

Replica Mechanism Overview

Kafka achieves high availability and data persistence by storing multiple copies (Replicas) of partitions across multiple Brokers. Each partition has one Leader replica and several Follower replicas:

  • Leader replica: Handles all client read/write requests, main entry point for data
  • Follower replica: Does not serve external requests, only syncs data from Leader, participates in election when Leader crashes

Replicas distributed across different Brokers, single Broker crash doesn’t cause data loss.

ISR (In-Sync Replicas) Sync Replica Set

ISR is the core concept of Kafka’s high availability mechanism, referring to the set of replicas synchronized with Leader.

ISR Maintenance Rules

Follower replica must satisfy both conditions to stay in ISR:

  1. Session alive: Follower must maintain active session with ZooKeeper (Kafka 2.8+ uses KRaft, changed to communicate with Controller)
  2. Sync not lagging: Follower’s data sync lag doesn’t exceed threshold, controlled by replica.lag.time.max.ms (default 30 seconds)
# Max time replica allowed to lag behind Leader (removed from ISR if exceeded)
replica.lag.time.max.ms=30000

When Follower sync speed can’t keep up with Leader’s message write speed, it’s removed from ISR, entering OSR (Out-of-Sync Replicas) list. When Follower catches up to Leader, it rejoins ISR.

Importance of ISR

Only after all replicas in ISR confirm receiving message (when acks=all) is the message considered “committed”. This guarantees:

“As long as at least one synchronized replica is alive, committed messages will not be lost.”

Failure Recovery Scenarios

Scenario 1: Partial Replica Failure

When Follower replica’s Broker crashes and recovers:

  1. Follower restarts, discovers data lags behind Leader
  2. Follower initiates data pull request to Leader, syncs from checkpoint
  3. After catching up to Leader’s HW (High Watermark), rejoins ISR

Scenario 2: Leader Replica Failure

When Leader’s Broker crashes:

  1. Controller (cluster controller) detects Leader crash
  2. Elects new Leader from current ISR list
  3. Other Followers start syncing from new Leader
  4. If ISR is empty, triggers unclean leader election

Leader Election Mechanism

Normal Leader Election

Leader election coordinated by Kafka Controller, preferentially elects from ISR:

ISR = [Broker1, Broker2, Broker3]
When Broker1 (Leader) crashes:
New Leader selected from {Broker2, Broker3} → Prefer Broker2 (first in ISR)

Controller itself has high availability guarantee - in ZooKeeper mode, multiple Brokers compete to register ephemeral node in ZooKeeper, winner becomes Controller.

Unclean Leader Election

When ISR is empty (all synchronized replicas crashed), Kafka faces dilemma:

StrategyConfigurationCharacteristics
Wait for ISR recoveryunclean.leader.election.enable=false (default)Strong consistency, may be unavailable for long time
Elect from OSRunclean.leader.election.enable=trueHigh availability, but may lose data

Production recommendation: For scenarios requiring extremely high data consistency like finance/orders, keep default value false; for scenarios where availability is more important than consistency (like log collection), can set to true.

ISR vs Majority Voting Comparison

Kafka’s ISR mechanism differs from majority voting used by Raft, ZAB:

FeatureISR MechanismMajority Voting
Replicas needed to tolerate f failuresf+12f+1
Max failures tolerable with 3 replicas21
Write latencyWait for all ISRWait for majority
PerformanceHigher (fewer replicas)Relatively lower

ISR advantage: With fewer replicas, can tolerate more node failures than majority voting while maintaining similar throughput.

HW and LEO

Understanding Kafka replica sync requires mastering two key concepts:

  • LEO (Log End Offset): Latest message offset + 1 currently written by each replica
  • HW (High Watermark): Maximum offset visible to consumers, equals minimum LEO of all replicas in ISR
LEO of replicas in ISR:
  Leader: LEO=100
  Follower1: LEO=98
  Follower2: LEO=97

HW = min(100, 98, 97) = 97
Consumers can only consume up to offset=96

HW mechanism ensures consumers can only read messages confirmed by all ISR replicas, guaranteeing no data loss after Leader switch.