Distributed Analysis

CAP Theory Explained

CAP theory (also known as Brewer’s theorem) is a fundamental principle for distributed system design proposed by computer scientist Eric Brewer in 2000. The theory states: For shared data systems in a distributed environment, you can only simultaneously satisfy two of the three properties: Consistency (C), Availability (A), and Partition tolerance (P).

Three Core Properties

C: Consistency

  • All nodes see exactly the same data at the same time
  • After the system executes an update operation, regardless of which node is accessed, the latest written data should be returned

A: Availability

  • For every user request, the system can return a response within a reasonable time
  • The system does not return errors
  • The system can continue to provide services when some nodes fail

P: Partition Tolerance

  • System can tolerate network partition failures
  • After network partition recovers, system can automatically merge data and maintain consistent state

CAP Trade-off Practices

  1. CA system (sacrificing P): Traditional relational databases deployed in a single data center
  2. AP system (sacrificing C): NoSQL databases like Cassandra, Dynamo
  3. CP system (sacrificing A): Coordination services like ZooKeeper, etcd

2PC Pattern

Working Principle

Prepare Phase (Voting Phase)

  1. Coordinator sends Prepare request to all participants
  2. Each participant executes transaction operations (but doesn’t commit)
  3. Participant gives response to coordinator: returns “yes” on success, “no” on failure

Commit Phase (Execution Phase)

  1. All participants return “yes”: Send Commit command
  2. Any participant returns “no” or timeout: Send Rollback command

Pros and Cons

Advantages:

  • Relatively simple implementation, guarantees strong consistency

Disadvantages:

  1. Synchronous blocking: All participants must wait for coordinator’s final instruction
  2. Coordinator single point of failure: If coordinator crashes, participants will be blocked indefinitely
  3. Data inconsistency risk: Some participants not receiving instructions leads to data inconsistency

3PC Pattern

Main Differences

Three-phase commit protocol (3PC) is an improved version of two-phase commit protocol:

  1. Timeout mechanism: Introduced in both coordinator and participants
  2. Phase splitting: Split prepare phase into canCommit and preCommit two phases

Phase 1: canCommit (Inquiry Phase)

  1. Coordinator sends canCommit request to all participants
  2. Participants check their status, return yes or no response

Phase 2: preCommit (Pre-commit Phase)

All participants return yes:

  1. Coordinator sends preCommit request
  2. Participants execute transaction operations but don’t commit
  3. Participants enter prepared completion state

Any participant returns no or timeout:

  1. Coordinator sends abort request
  2. Participants execute transaction rollback

Phase 3: doCommit (Commit Phase)

  • All participants return ack: Execute doCommit, complete transaction commit
  • Any participant returns no or timeout: Execute abort, perform transaction rollback

3PC Advantages

  1. Reduced blocking time
  2. Lower impact from coordinator single point of failure
  3. Participants can automatically advance process after timeout

XA Pattern

Basic Concepts

XA (eXtended Architecture) specification is a distributed transaction specification proposed by X/Open in 1991, based on two-phase commit protocol (2PC).

Core Roles

  1. Global Transaction Manager (TM): Responsible for coordinating the entire distributed transaction
  2. Local Resource Manager (RM): Manages local resource transaction operations, usually database systems

XA Transaction Workflow

  1. Prepare phase: Transaction manager sends prepare request to all participants
  2. Commit/Rollback phase: Decide global commit or rollback based on participants’ feedback

Why Transaction Manager is Needed

According to CAP theorem, in a distributed environment, two or more machines cannot achieve complete consistency through their own coordination. Therefore, a centralized transaction manager needs to be introduced to coordinate various resources.