Redis High Availability: Master-Slave Replication & Sentinel

This is article 51 in the Big Data series, covering Redis high availability architecture: master-slave replication, Sentinel mode, and distributed lock design.

High Availability Basics

High Availability (HA) refers to a system’s ability to stay operational while minimizing unplanned downtime, typically measured in “nines”:

SLA	Max Annual Downtime
99.9% (3 nines)	~8.76 hours
99.99% (4 nines)	~52.6 minutes
99.999% (5 nines)	~5.26 minutes

From CAP theory perspective, Redis HA solutions prioritize A (Availability) and P (Partition Tolerance), with possible brief data inconsistency during extreme network partitions.

Four pillars of HA: redundancy, failure detection, auto recovery, load balancing.

Redis Master-Slave Replication

Master-slave replication is the foundation of Redis HA. Slave nodes sync data from the master to provide read scaling and data redundancy.

Configuration

Add to slave’s redis.conf:

replicaof 192.168.1.100 6379

Or dynamically at runtime:

REPLICAOF 192.168.1.100 6379

Three Sync Modes

1. Full Sync

Triggered on first connection or when data gap is too large:

Slave sends PSYNC ? -1 requesting full sync
Master executes BGSAVE to generate RDB snapshot
Master transfers RDB file to slave
Slave loads RDB, meanwhile master sends buffered write commands
Enter incremental sync mode

2. Incremental Sync

During normal operation, master sends each write command to all slaves in real-time.

3. Heartbeat

Slave sends REPLCONF ACK <offset> every 1 second, allowing master to check slave health and sync offset.

Common Topologies

One Master Multiple Slaves: Most common - master handles writes, slaves handle reads
Chain Replication: Slave has its own slaves, distributing master’s replication load

Redis Sentinel Mode

Sentinel mode adds automatic failover capability on top of master-slave replication. Sentinels are independent processes that continuously monitor Redis instance health.

Sentinel Core Responsibilities

Responsibility	Description
Monitoring	Continuously checks master/slave health via PING
Notification	Notifies admin or other programs via API on anomalies
Failover	Automatically elects new master and reconfigures slaves when master fails
Config Provider	Clients obtain current master address through Sentinel

Failure Detection & Failover Flow

Single sentinel detects master unresponsive
        ↓
Subjective Down (SDOWN): Sentinel suspects master failure
        ↓
Sentinel voting (requires quorum)
        ↓
Objective Down (ODOWN): Majority confirm master failure
        ↓
Raft algorithm elects leader sentinel
        ↓
Leader sentinel executes failover:
  1. Select best candidate from slaves (priority, replication offset)
  2. Promote candidate to new master
  3. Notify other slaves to replicate new master
  4. Update configuration files

Docker Deployment Example

Complete docker-compose.yml for 1 master + 2 slaves + 3 sentinels:

version: '3'
services:
  redis-master:
    image: redis:7
    ports:
      - "6379:6379"

  redis-slave-1:
    image: redis:7
    ports:
      - "6380:6379"
    command: redis-server --replicaof redis-master 6379
    depends_on:
      - redis-master

  redis-slave-2:
    image: redis:7
    ports:
      - "6381:6379"
    command: redis-server --replicaof redis-master 6379
    depends_on:
      - redis-master

  sentinel-1:
    image: redis:7
    ports:
      - "26379:26379"
    command: redis-sentinel /etc/sentinel.conf
    depends_on:
      - redis-master
      - redis-slave-1
      - redis-slave-2

  sentinel-2:
    image: redis:7
    ports:
      - "26380:26379"
    command: redis-sentinel /etc/sentinel.conf
    depends_on:
      - redis-master

  sentinel-3:
    image: redis:7
    ports:
      - "26381:26379"
    command: redis-sentinel /etc/sentinel.conf
    depends_on:
      - redis-master

Sentinel config sentinel.conf key parameters:

# Monitor master named mymaster, quorum=2 means need 2 sentinels to confirm ODOWN
sentinel monitor mymaster redis-master 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000

Distributed Lock Basics

Redis distributed lock core command:

SET lock_key unique_value NX PX 30000

NX: Set only if key doesn’t exist (atomic lock acquisition)
PX 30000: 30 second TTL (prevents deadlocks)

Release requires verifying value belongs to current holder, typically using Lua script for atomicity:

if redis.call("get", KEYS[1]) == ARGV[1] then
    return redis.call("del", KEYS[1])
else
    return 0
end

In Sentinel mode, there’s lock loss risk during master switch (master write success but slave not synced). Production should evaluate Redlock algorithm or strong consistency solutions like ZooKeeper.

Summary

Master-slave replication provides data redundancy and read scaling, foundation of HA
Sentinel mode adds automatic failover on top of master-slave, improving availability
Distributed lock depends on Redis atomic operations, but需要注意边界情况 during HA failover
Production deployment recommended: at least 3 sentinel nodes, quorum set to (sentinel_count/2) + 1