This is article 54 in the Big Data series, introducing Kafka cluster installation and configuration, covering the architectural evolution from ZooKeeper dependency to KRaft self-management mode.

Kafka Version Evolution Overview

Kafka 2.x Main Features

FeatureDescription
Kafka Streams EnhancementSupport outer joins, Global Tables (GlobalKTable), interactive queries
Write Throughput ImprovementLog management system rewrite, ~30% throughput improvement
Zero-Copy OptimizationFurther reduce CPU overhead in message transfer
Dynamic ConfigurationModify parameters online via AdminClient API, no restart needed
Security EnhancementKerberos integration, fine-grained ACL, cross-domain support

Kafka 2.x still relies on ZooKeeper for cluster metadata management and Controller election.

Kafka 3.x Architecture Transformation

Kafka 3.x is a milestone version, biggest change is introducing KRaft (Kafka Raft):

MilestoneVersionDescription
KRaft Preview3.0KRaft mode available, ZooKeeper still default
ZooKeeper Deprecated3.5.xZooKeeper marked Deprecated
ZooKeeper Removed4.0 (planned)Complete ZooKeeper dependency removal

KRaft core improvements:

  • Uses Raft consensus algorithm for cluster metadata management, Kafka itself takes Controller role
  • Metadata log stored in Kafka internal Topic __cluster_metadata
  • Cluster startup significantly faster (no longer waits for ZooKeeper connection)
  • Supports single cluster partition scaling from 200K to 2 million
  • Reduced operational complexity: no longer need separate ZooKeeper cluster maintenance

Environment Preparation

Prerequisites

DependencyVersion RequirementDescription
JavaJDK 8+Kafka 2.x recommends JDK 8/11
ZooKeeper3.6+Required for Kafka 2.x, optional for Kafka 3.x KRaft mode
Servers3 nodesProduction minimum 3 nodes for HA

Node Planning Example

h121.wzk.icu  ← ZooKeeper + Kafka Broker (broker.id=1)
h122.wzk.icu  ← ZooKeeper + Kafka Broker (broker.id=2)
h123.wzk.icu  ← ZooKeeper + Kafka Broker (broker.id=3)

ZooKeeper Configuration Verification

Before deploying Kafka, confirm ZooKeeper cluster is running:

# Configure ZooKeeper environment variables
vim /etc/profile

export ZOOKEEPER_HOME=/opt/servers/apache-zookeeper-3.8.4-bin
export PATH=$PATH:$ZOOKEEPER_HOME/bin

source /etc/profile

# Verify each node status (should have 1 leader, others followers)
zkServer.sh status

Kafka Installation Steps

1. Download and Extract

# Download precompiled binary (recommended, avoid building yourself)
# kafka_2.12-2.7.2 means Scala 2.12 compiled, Kafka version 2.7.2
tar -zxvf kafka_2.12-2.7.2.tgz
mv kafka_2.12-2.7.2 /opt/servers/

2. Configure Environment Variables

vim /etc/profile

# Kafka configuration
export KAFKA_HOME=/opt/servers/kafka_2.12-2.7.2
export PATH=$PATH:$KAFKA_HOME/bin

source /etc/profile

3. Configure Broker Parameters

Edit /opt/servers/kafka_2.12-2.7.2/config/server.properties:

# Unique Broker ID, must be different for each node (0, 1, 2...)
broker.id=1

# Broker listener address
listeners=PLAINTEXT://h121.wzk.icu:9092

# ZooKeeper connection address (same across all nodes)
zookeeper.connect=h121.wzk.icu:2181,h122.wzk.icu:2181,h123.wzk.icu:2181

# Message log storage directory
log.dirs=/var/kafka-logs

# Default Partition replica count
default.replication.factor=2

# Default Partition count
num.partitions=3

# Message retention duration (hours)
log.retention.hours=168

Create log directory on all nodes:

mkdir -p /var/kafka-logs

Repeat above configuration for h122, h123, modifying broker.id (2, 3 respectively) and listeners address.

4. Distribute Configuration to Other Nodes

# Distribute installation directory to other nodes
scp -r /opt/servers/kafka_2.12-2.7.2 h122.wzk.icu:/opt/servers/
scp -r /opt/servers/kafka_2.12-2.7.2 h123.wzk.icu:/opt/servers/

# Modify broker.id=2 on h122, broker.id=3 on h123

5. Start Kafka

# Foreground (for testing)
kafka-server-start.sh /opt/servers/kafka_2.12-2.7.2/config/server.properties

# Daemon mode (production)
kafka-server-start.sh -daemon /opt/servers/kafka_2.12-2.7.2/config/server.properties

6. Verify Startup

# Check Kafka process
jps
# Output should contain Kafka

# Check Brokers registered in ZooKeeper
zkCli.sh -server h121.wzk.icu:2181
ls /brokers/ids
# Output [1, 2, 3] means 3 Brokers registered successfully

KRaft Mode Deployment (Kafka 3.x)

Kafka 3.x can run completely without ZooKeeper in KRaft mode:

Generate Cluster UUID

KAFKA_CLUSTER_ID="$(kafka-storage.sh random-uuid)"

Format Storage Directory

kafka-storage.sh format -t $KAFKA_CLUSTER_ID \
  -c /opt/servers/kafka_3.x/config/kraft/server.properties

KRaft Config Key Points

# KRaft mode configuration (config/kraft/server.properties)
process.roles=broker,controller    # Node acts as both Broker and Controller
node.id=1
controller.quorum.voters=1@h121.wzk.icu:9093,2@h122.wzk.icu:9093,3@h123.wzk.icu:9093
listeners=PLAINTEXT://h121.wzk.icu:9092,CONTROLLER://h121.wzk.icu:9093
log.dirs=/var/kafka-logs

Start

kafka-server-start.sh /opt/servers/kafka_3.x/config/kraft/server.properties

ZooKeeper Mode vs KRaft Mode Comparison

DimensionZooKeeper ModeKRaft Mode
External DependencyRequires separate ZooKeeper clusterNo external dependency
Operational ComplexityHigh (two systems)Low (single system)
Max Partitions~200K~2M
Metadata StorageZooKeeper znodesKafka internal Topic
Startup SpeedSlowerFaster
Production MaturityMature and stableProduction ready in 3.3+

Summary

  • Kafka 2.x installation requires pre-deploying ZooKeeper, configuration focus on broker.id, zookeeper.connect, log.dirs
  • Kafka 3.x introduces KRaft, gradually phasing out ZooKeeper, simplifying operations and increasing scaling limits
  • New projects recommended to choose Kafka 3.3+ with KRaft mode enabled, stable for production
  • Regardless of mode, 3-node deployment is basic requirement for HA