Distributed Analysis
CAP Theory Explained
CAP theory (also known as Brewer’s theorem) is a fundamental principle for distributed system design proposed by computer scientist Eric Brewer in 2000. The theory states: For shared data systems in a distributed environment, you can only simultaneously satisfy two of the three properties: Consistency (C), Availability (A), and Partition tolerance (P).
Three Core Properties
C: Consistency
- All nodes see exactly the same data at the same time
- After the system executes an update operation, regardless of which node is accessed, the latest written data should be returned
A: Availability
- For every user request, the system can return a response within a reasonable time
- The system does not return errors
- The system can continue to provide services when some nodes fail
P: Partition Tolerance
- System can tolerate network partition failures
- After network partition recovers, system can automatically merge data and maintain consistent state
CAP Trade-off Practices
- CA system (sacrificing P): Traditional relational databases deployed in a single data center
- AP system (sacrificing C): NoSQL databases like Cassandra, Dynamo
- CP system (sacrificing A): Coordination services like ZooKeeper, etcd
2PC Pattern
Working Principle
Prepare Phase (Voting Phase)
- Coordinator sends Prepare request to all participants
- Each participant executes transaction operations (but doesn’t commit)
- Participant gives response to coordinator: returns “yes” on success, “no” on failure
Commit Phase (Execution Phase)
- All participants return “yes”: Send Commit command
- Any participant returns “no” or timeout: Send Rollback command
Pros and Cons
Advantages:
- Relatively simple implementation, guarantees strong consistency
Disadvantages:
- Synchronous blocking: All participants must wait for coordinator’s final instruction
- Coordinator single point of failure: If coordinator crashes, participants will be blocked indefinitely
- Data inconsistency risk: Some participants not receiving instructions leads to data inconsistency
3PC Pattern
Main Differences
Three-phase commit protocol (3PC) is an improved version of two-phase commit protocol:
- Timeout mechanism: Introduced in both coordinator and participants
- Phase splitting: Split prepare phase into canCommit and preCommit two phases
Phase 1: canCommit (Inquiry Phase)
- Coordinator sends canCommit request to all participants
- Participants check their status, return yes or no response
Phase 2: preCommit (Pre-commit Phase)
All participants return yes:
- Coordinator sends preCommit request
- Participants execute transaction operations but don’t commit
- Participants enter prepared completion state
Any participant returns no or timeout:
- Coordinator sends abort request
- Participants execute transaction rollback
Phase 3: doCommit (Commit Phase)
- All participants return ack: Execute doCommit, complete transaction commit
- Any participant returns no or timeout: Execute abort, perform transaction rollback
3PC Advantages
- Reduced blocking time
- Lower impact from coordinator single point of failure
- Participants can automatically advance process after timeout
XA Pattern
Basic Concepts
XA (eXtended Architecture) specification is a distributed transaction specification proposed by X/Open in 1991, based on two-phase commit protocol (2PC).
Core Roles
- Global Transaction Manager (TM): Responsible for coordinating the entire distributed transaction
- Local Resource Manager (RM): Manages local resource transaction operations, usually database systems
XA Transaction Workflow
- Prepare phase: Transaction manager sends prepare request to all participants
- Commit/Rollback phase: Decide global commit or rollback based on participants’ feedback
Why Transaction Manager is Needed
According to CAP theorem, in a distributed environment, two or more machines cannot achieve complete consistency through their own coordination. Therefore, a centralized transaction manager needs to be introduced to coordinate various resources.