CAP Theory & Distributed Transactions: From 2PC/3PC to XA

Distributed Analysis

CAP Theory Explained

CAP theory (also known as Brewer’s theorem) is a fundamental principle for distributed system design proposed by computer scientist Eric Brewer in 2000. The theory states: For shared data systems in a distributed environment, you can only simultaneously satisfy two of the three properties: Consistency (C), Availability (A), and Partition tolerance (P).

Three Core Properties

C: Consistency

All nodes see exactly the same data at the same time
After the system executes an update operation, regardless of which node is accessed, the latest written data should be returned

A: Availability

For every user request, the system can return a response within a reasonable time
The system does not return errors
The system can continue to provide services when some nodes fail

P: Partition Tolerance

System can tolerate network partition failures
After network partition recovers, system can automatically merge data and maintain consistent state

CAP Trade-off Practices

CA system (sacrificing P): Traditional relational databases deployed in a single data center
AP system (sacrificing C): NoSQL databases like Cassandra, Dynamo
CP system (sacrificing A): Coordination services like ZooKeeper, etcd

2PC Pattern

Working Principle

Prepare Phase (Voting Phase)

Coordinator sends Prepare request to all participants
Each participant executes transaction operations (but doesn’t commit)
Participant gives response to coordinator: returns “yes” on success, “no” on failure

Commit Phase (Execution Phase)

All participants return “yes”: Send Commit command
Any participant returns “no” or timeout: Send Rollback command

Pros and Cons

Advantages:

Relatively simple implementation, guarantees strong consistency

Disadvantages:

Synchronous blocking: All participants must wait for coordinator’s final instruction
Coordinator single point of failure: If coordinator crashes, participants will be blocked indefinitely
Data inconsistency risk: Some participants not receiving instructions leads to data inconsistency

3PC Pattern

Main Differences

Three-phase commit protocol (3PC) is an improved version of two-phase commit protocol:

Timeout mechanism: Introduced in both coordinator and participants
Phase splitting: Split prepare phase into canCommit and preCommit two phases

Phase 1: canCommit (Inquiry Phase)

Coordinator sends canCommit request to all participants
Participants check their status, return yes or no response

Phase 2: preCommit (Pre-commit Phase)

All participants return yes:

Coordinator sends preCommit request
Participants execute transaction operations but don’t commit
Participants enter prepared completion state

Any participant returns no or timeout:

Coordinator sends abort request
Participants execute transaction rollback

Phase 3: doCommit (Commit Phase)

All participants return ack: Execute doCommit, complete transaction commit
Any participant returns no or timeout: Execute abort, perform transaction rollback

3PC Advantages

Reduced blocking time
Lower impact from coordinator single point of failure
Participants can automatically advance process after timeout

XA Pattern

Basic Concepts

XA (eXtended Architecture) specification is a distributed transaction specification proposed by X/Open in 1991, based on two-phase commit protocol (2PC).

Core Roles

Global Transaction Manager (TM): Responsible for coordinating the entire distributed transaction
Local Resource Manager (RM): Manages local resource transaction operations, usually database systems

XA Transaction Workflow

Prepare phase: Transaction manager sends prepare request to all participants
Commit/Rollback phase: Decide global commit or rollback based on participants’ feedback

Why Transaction Manager is Needed

According to CAP theorem, in a distributed environment, two or more machines cannot achieve complete consistency through their own coordination. Therefore, a centralized transaction manager needs to be introduced to coordinate various resources.