This is article 66 in the Big Data series, deeply analyzing Kafka’s underlying I/O optimization technologies achieving extremely high throughput.

Why Kafka is So Fast

Kafka can achieve millions of messages per second throughput on disk-based architecture, not relying on hardware stacking, but making three key optimizations from OS level: zero-copy (sendfile), memory-mapped write (mmap) and page cache + sequential write. These three work together to greatly reduce CPU context switches and data copy times.

Traditional I/O Bottleneck

Taking typical scenario “read file and send over network”, traditional approach needs four data copies:

  1. Disk → kernel page cache (DMA copy)
  2. kernel page cache → user space application buffer (CPU copy)
  3. user space buffer → socket kernel buffer (CPU copy)
  4. socket buffer → network card (DMA copy)

Steps 2 and 3 involve two CPU-involved memory copies and two user-space/kernel-space context switches. In high concurrency scenarios, this is a clear performance bottleneck.

sendfile: Zero-Copy on Consumer Side

When Consumer pulls messages, Kafka Broker uses Linux’s sendfile system call to transfer log file data directly from kernel page cache to network socket, completely bypassing user space:

Disk → kernel page cache → socket buffer (with file descriptor) → network card

Only two DMA copies, zero CPU copies, zero user-space/kernel-space switches. Tests show CPU utilization can be reduced by about 60% in same scenario, throughput improved 30%–200% (varies with file size).

In Kafka replica sync, Follower pulling log from Leader also uses sendfile path, ensuring replica sync doesn’t become performance bottleneck.

mmap: Memory-Mapped Write on Producer Side

On write side, Kafka uses mmap (memory-mapped file) to map log file to process virtual address space. When Producer writes message, actually writes to a memory area, OS is responsible for asynchronously flushing to disk.

mmap advantages:

  • Write operations become pure memory writes, extremely low latency
  • Supports random access, suitable for index file (.index, .timeindex) read/write
  • OS responsible for flush timing, reduces fsync call frequency
TechnologyApplication ScenarioPath
sendfileBroker → Consumer / replica syncRead path (zero-copy send)
mmapProducer → Broker log writeWrite path (memory-mapped write)

Page Cache and Sequential Write

Kafka messages are written to OS page cache, not directly to disk. OS batches and asynchronously flushes dirty pages to disk. This means:

  • Extremely low write latency: Producer confirms write returns immediately, doesn’t wait for physical disk I/O
  • High read hit rate: Recently written messages likely still in page cache, Consumer pulls directly from memory
  • Service restart doesn’t失效: Page cache is OS-level, cache persists after Broker process restart

All Kafka log writes use sequential append (Append-Only) mode, fully utilizing OS read-ahead and write-behind optimization for linear reads/writes. Mechanical disk sequential I/O throughput can match random memory access, SSD sequential write is even better.

Combined Effect

Three technologies work together in Kafka’s complete message pipeline:

Producer writes
  → mmap appends to page cache (sequential write)
  → OS async flush to disk

Consumer pulls
  → sendfile sends directly from page cache to network (zero-copy)
  → Follower replica sync same

This design allows Kafka to easily support million-level message read/write per second on single node, while maintaining low CPU usage. It’s a classic example of high-performance message middleware design.