This is article 54 in the Big Data series, introducing Kafka cluster installation and configuration, covering the architectural evolution from ZooKeeper dependency to KRaft self-management mode.
Kafka Version Evolution Overview
Kafka 2.x Main Features
| Feature | Description |
|---|---|
| Kafka Streams Enhancement | Support outer joins, Global Tables (GlobalKTable), interactive queries |
| Write Throughput Improvement | Log management system rewrite, ~30% throughput improvement |
| Zero-Copy Optimization | Further reduce CPU overhead in message transfer |
| Dynamic Configuration | Modify parameters online via AdminClient API, no restart needed |
| Security Enhancement | Kerberos integration, fine-grained ACL, cross-domain support |
Kafka 2.x still relies on ZooKeeper for cluster metadata management and Controller election.
Kafka 3.x Architecture Transformation
Kafka 3.x is a milestone version, biggest change is introducing KRaft (Kafka Raft):
| Milestone | Version | Description |
|---|---|---|
| KRaft Preview | 3.0 | KRaft mode available, ZooKeeper still default |
| ZooKeeper Deprecated | 3.5.x | ZooKeeper marked Deprecated |
| ZooKeeper Removed | 4.0 (planned) | Complete ZooKeeper dependency removal |
KRaft core improvements:
- Uses Raft consensus algorithm for cluster metadata management, Kafka itself takes Controller role
- Metadata log stored in Kafka internal Topic
__cluster_metadata - Cluster startup significantly faster (no longer waits for ZooKeeper connection)
- Supports single cluster partition scaling from 200K to 2 million
- Reduced operational complexity: no longer need separate ZooKeeper cluster maintenance
Environment Preparation
Prerequisites
| Dependency | Version Requirement | Description |
|---|---|---|
| Java | JDK 8+ | Kafka 2.x recommends JDK 8/11 |
| ZooKeeper | 3.6+ | Required for Kafka 2.x, optional for Kafka 3.x KRaft mode |
| Servers | 3 nodes | Production minimum 3 nodes for HA |
Node Planning Example
h121.wzk.icu ← ZooKeeper + Kafka Broker (broker.id=1)
h122.wzk.icu ← ZooKeeper + Kafka Broker (broker.id=2)
h123.wzk.icu ← ZooKeeper + Kafka Broker (broker.id=3)
ZooKeeper Configuration Verification
Before deploying Kafka, confirm ZooKeeper cluster is running:
# Configure ZooKeeper environment variables
vim /etc/profile
export ZOOKEEPER_HOME=/opt/servers/apache-zookeeper-3.8.4-bin
export PATH=$PATH:$ZOOKEEPER_HOME/bin
source /etc/profile
# Verify each node status (should have 1 leader, others followers)
zkServer.sh status
Kafka Installation Steps
1. Download and Extract
# Download precompiled binary (recommended, avoid building yourself)
# kafka_2.12-2.7.2 means Scala 2.12 compiled, Kafka version 2.7.2
tar -zxvf kafka_2.12-2.7.2.tgz
mv kafka_2.12-2.7.2 /opt/servers/
2. Configure Environment Variables
vim /etc/profile
# Kafka configuration
export KAFKA_HOME=/opt/servers/kafka_2.12-2.7.2
export PATH=$PATH:$KAFKA_HOME/bin
source /etc/profile
3. Configure Broker Parameters
Edit /opt/servers/kafka_2.12-2.7.2/config/server.properties:
# Unique Broker ID, must be different for each node (0, 1, 2...)
broker.id=1
# Broker listener address
listeners=PLAINTEXT://h121.wzk.icu:9092
# ZooKeeper connection address (same across all nodes)
zookeeper.connect=h121.wzk.icu:2181,h122.wzk.icu:2181,h123.wzk.icu:2181
# Message log storage directory
log.dirs=/var/kafka-logs
# Default Partition replica count
default.replication.factor=2
# Default Partition count
num.partitions=3
# Message retention duration (hours)
log.retention.hours=168
Create log directory on all nodes:
mkdir -p /var/kafka-logs
Repeat above configuration for h122, h123, modifying broker.id (2, 3 respectively) and listeners address.
4. Distribute Configuration to Other Nodes
# Distribute installation directory to other nodes
scp -r /opt/servers/kafka_2.12-2.7.2 h122.wzk.icu:/opt/servers/
scp -r /opt/servers/kafka_2.12-2.7.2 h123.wzk.icu:/opt/servers/
# Modify broker.id=2 on h122, broker.id=3 on h123
5. Start Kafka
# Foreground (for testing)
kafka-server-start.sh /opt/servers/kafka_2.12-2.7.2/config/server.properties
# Daemon mode (production)
kafka-server-start.sh -daemon /opt/servers/kafka_2.12-2.7.2/config/server.properties
6. Verify Startup
# Check Kafka process
jps
# Output should contain Kafka
# Check Brokers registered in ZooKeeper
zkCli.sh -server h121.wzk.icu:2181
ls /brokers/ids
# Output [1, 2, 3] means 3 Brokers registered successfully
KRaft Mode Deployment (Kafka 3.x)
Kafka 3.x can run completely without ZooKeeper in KRaft mode:
Generate Cluster UUID
KAFKA_CLUSTER_ID="$(kafka-storage.sh random-uuid)"
Format Storage Directory
kafka-storage.sh format -t $KAFKA_CLUSTER_ID \
-c /opt/servers/kafka_3.x/config/kraft/server.properties
KRaft Config Key Points
# KRaft mode configuration (config/kraft/server.properties)
process.roles=broker,controller # Node acts as both Broker and Controller
node.id=1
controller.quorum.voters=1@h121.wzk.icu:9093,2@h122.wzk.icu:9093,3@h123.wzk.icu:9093
listeners=PLAINTEXT://h121.wzk.icu:9092,CONTROLLER://h121.wzk.icu:9093
log.dirs=/var/kafka-logs
Start
kafka-server-start.sh /opt/servers/kafka_3.x/config/kraft/server.properties
ZooKeeper Mode vs KRaft Mode Comparison
| Dimension | ZooKeeper Mode | KRaft Mode |
|---|---|---|
| External Dependency | Requires separate ZooKeeper cluster | No external dependency |
| Operational Complexity | High (two systems) | Low (single system) |
| Max Partitions | ~200K | ~2M |
| Metadata Storage | ZooKeeper znodes | Kafka internal Topic |
| Startup Speed | Slower | Faster |
| Production Maturity | Mature and stable | Production ready in 3.3+ |
Summary
- Kafka 2.x installation requires pre-deploying ZooKeeper, configuration focus on
broker.id,zookeeper.connect,log.dirs - Kafka 3.x introduces KRaft, gradually phasing out ZooKeeper, simplifying operations and increasing scaling limits
- New projects recommended to choose Kafka 3.3+ with KRaft mode enabled, stable for production
- Regardless of mode, 3-node deployment is basic requirement for HA