Distributed Services: Paxos Algorithm - The Gold Standard

Introduction

The Paxos algorithm is a message-passing distributed consensus algorithm proposed by computer scientist Leslie Lamport. This groundbreaking work earned him the 2013 Turing Award. The algorithm solves the problem of how to reach consensus on a value (resolution) in distributed systems, even when some nodes fail or the network is unstable, while ensuring system consistency.

Development History

The Paxos algorithm was first publicly introduced by Lamport in 1998 in the paper “The Part-Time Parliament.” The paper used a unique narrative approach, using a small island in Greece called Paxos as a metaphor, describing in detail the process of passing parliamentary resolutions in Paxos and naming the algorithm accordingly.

In 2001, Lamport realized that academic peers had difficulty understanding his humorous way of expression, so he republished a more concise and direct algorithm description version called “Paxos Made Simple.”

Industry Impact and Applications

Since its introduction, Paxos has dominated the field of distributed consensus algorithms. The algorithm has been widely applied in industry:

Google System Applications: Chubby (distributed lock service), Megastore (high-availability storage system), Spanner (globally distributed database)
Open Source System Applications: ZooKeeper (distributed coordination service), MySQL 5.7’s Group Replication

Algorithm Characteristics

The Paxos algorithm, although powerful, is known for being difficult to understand and implement. Its main difficulties are:

The algorithm description is abstract, requiring deep understanding of distributed system principles
Implementation needs to handle various edge cases and exceptional situations
Performance optimization requires consideration of actual network environments

What Problems Does It Solve

Problem Background

In modern distributed systems, data is typically stored with multiple replicas to improve availability and fault tolerance. Typical scenarios include: distributed databases (such as Google Spanner), distributed storage systems (such as HDFS), distributed computing frameworks, etc.

Specific Challenges

Network Partitioning: Communication between nodes may be interrupted
Message Delay: Operation instructions arrive at different nodes at inconsistent times
Node Failure: Some nodes may fail temporarily or permanently
Concurrent Conflicts: Multiple clients simultaneously initiate modification requests

Solution Details

Paxos guarantees consistency through the following mechanisms:

Prepare Phase: Nodes propose values and collect majority responses
Accept Phase: Determine the final value and obtain majority confirmation
Learn Phase: Propagate the determined value to all replicas

Paxos can guarantee that the system presents a consistent state to the outside world as long as more than half of the nodes are working normally, even when facing network problems and node failures.

Proposal Mechanism

A proposal is the core concept of the Paxos algorithm. It is an information packet containing two key components:

Proposal ID: A globally unique identifier, usually composed of a proposal number (typically an incrementing number) and a unique identifier of the proposer (such as server ID). For example: <5, ServerA> represents proposal number 5 initiated by ServerA.
Value: This is the final data value that the distributed system is expected to reach consensus on.

Role Division in Paxos Algorithm

The Paxos algorithm defines four key roles, each with its own responsibilities:

1. Client

Is the initiator of external system requests
Sends operation requests (such as read/write requests) to the distributed system
Waits for system response and confirmation

2. Proposer

Core responsibilities: Receive client requests and convert them into proposals, promote the acceptance process of proposals, handle coordination work when proposal conflicts occur
Workflow:
1. Prepare phase: Send prepare requests to Acceptors
2. Propose phase: Send formal proposals after receiving responses from majority Acceptors

3. Acceptor

Key functions: Receive and process Proposer requests, decide whether to accept a proposal, guarantee safety constraints
Decision rules: Only accept proposals with numbers higher than the promised number, must remember the highest numbered proposal accepted
Important characteristic: Usually requires a quorum of Acceptors to accept before a proposal can be selected

4. Learner

Main tasks: Observe and learn the selected Value, propagate the final resolution to the entire system
Implementation: Can monitor Acceptors’ decisions, or be notified of selection results by the Proposer

These roles in actual systems are usually held by server nodes concurrently. For example, a node may simultaneously play the roles of both Proposer and Acceptor.

Derivation Process

Simplest Solution: Only One Acceptor

In the implementation of the Paxos algorithm, the simplest solution is to use only one Acceptor (there can be multiple Proposers). In this configuration, the system workflow is as follows:

Any Proposer can submit a proposal to the sole Acceptor
The Acceptor follows a “first-come-first-accepted” principle, accepting the first proposal it receives
Once the Acceptor accepts a proposal, that proposal is selected
The Value contained in that proposal becomes the final selected value for the entire system

The advantage of this solution is simple implementation, ensuring the system will eventually select a value. However, this solution has serious problems:

Single Point of Failure Risk: If this sole Acceptor crashes, the entire system will completely stop working
Performance Bottleneck: All requests must pass through a single Acceptor for processing
Lack of Fault Tolerance: Unable to handle exceptional situations like network partitioning

Therefore, in actual production environments, multiple Acceptors must be used to build a reliable distributed system.

Multiple Acceptors

How to ensure that a Value is selected with multiple Proposers and multiple Acceptors?

P1: An Acceptor must accept the first proposal it receives.

However, if each Proposer respectively proposes different Values and sends them to different Acceptors. According to the constraint just mentioned, Acceptors respectively accept the first proposal they receive, leading to different Values being selected, resulting in inconsistency.

Therefore, we need to add a rule: A proposal requires acceptance from more than half of the Acceptors to be selected.

Based on the above, although we now allow multiple proposals to be selected, we must ensure that all selected proposals have the same Value; otherwise, inconsistency will occur again.

P2: If a proposal with Value v is selected, then every selected proposal with a higher number must also have Value v.

A proposal can only be selected if it is accepted by an Acceptor, so we can rewrite the P2 constraint into a constraint on proposals accepted by Acceptors, P2a.

P2a: If a proposal with Value v is selected, then every proposal with a higher number accepted by an Acceptor must also have Value v.

As long as P2a is satisfied, P2 is satisfied.

However, consider the following situation: Suppose there are 5 Acceptors in total. The Proposer proposes a proposal [M1, V1]. Acceptors 2 through 5 (more than half) accept the proposal. Therefore, for Acceptors 2 through 5 and Proposer 2, they all believe V1 is selected. Acceptor 1 has just recovered from a crash (previously Acceptor 1 did not receive any proposals). At this point, Proposer 1 sends proposal [M2, V2] to Acceptor 1 (V2 is not equal to V1, and M2 is greater than M1). For Acceptor 1, this is the first proposal it has received. According to P1 (an Acceptor must accept the first proposal it receives), Acceptor 1 must accept this proposal, and Acceptor 1 believes V2 is selected.

This results in the following two problems:

Acceptor 1 believes V2 is selected, while Acceptors 2 through 5 and Proposer 2 believe V1 is selected, resulting in inconsistency
V1 is selected, but the proposal [M2, V2] with a higher number accepted by Acceptor 1 has Value V2, and V2 != V1. This contradicts P2a

Therefore, we need to strengthen the P2a constraint!

P2a is a constraint on proposals accepted by Acceptors, but proposals are actually proposed by Proposers. So we can impose a constraint on proposals proposed by Proposers to get P2b:

P2b: If a proposal with Value v is selected, then any proposal with a higher number proposed by any Proposer afterward must also have Value v.

From P2b, we can derive P2a, which then derives P2. So, how to ensure that after a proposal with Value v is selected, all proposals with higher numbers proposed by Proposers have Value v?

As long as P2c is satisfied:

P2c: For any Mn and Vn, if proposal [Mn, Vn] is proposed, then there must definitely exist a set S composed of more than half of the Acceptors that satisfies one of the following two conditions:

Either each Acceptor in S has not accepted any proposal with a number less than Mn
Or among all proposals with numbers less than Mn approved by all Acceptors in S, the proposal with the highest number has Value Vn

From the above, it can be seen that the process from P1 to P2c is actually a gradual strengthening of a series of conditions. If it is necessary to prove that these conditions can guarantee consistency, then reverse derivation can be performed: P2c => P2b => P2a => P2, and then consistency is guaranteed through P2 and P1.