Design Philosophy
When building distributed cluster architecture, systematic consideration is needed from three key dimensions:
1. Availability
Availability refers to the system’s ability to provide normal services within a specified time, usually measured by “SLA” (Service Level Agreement) such as “99.99%” annual availability.
Common strategies for achieving high availability include:
- Redundancy design: Deploy multiple instances to avoid single point of failure
- Failover: Automatically detect and switch faulty nodes
- Graceful degradation: Still provide basic services when some components fail
- Disaster recovery design: Cross-datacenter/cross-region deployment
2. Scalability
Scalability refers to the system’s ability to handle load growth, divided into:
- Vertical scaling (Scale Up): Improve single-machine performance
- Horizontal scaling (Scale Out): Increase number of servers
Design principles:
- Stateless design: Facilitates horizontal scaling
- Data sharding: Such as database sharding
- Read-write separation: Reduce master database pressure
- Microservices architecture: Independent scaling by functional modules
3. Consistency
Consistency refers to the degree to which data remains synchronized across nodes in a distributed system. According to business requirements, you can choose:
Consistency models:
- Strong consistency (CP): All node data completely synchronized
- Weak consistency (AP): Prioritize availability, allow brief inconsistency
- Eventual consistency: Becomes consistent after a period of time
Detailed Analysis
Availability
1. Site High Availability (Redundant Sites)
- Adopt multi-datacenter deployment strategy, deploy mirror sites in different geographic locations (such as East China, North China, South China)
- Achieve traffic distribution through DNS polling or Global Server Load Balancing (GSLB)
2. Service High Availability (Redundant Services)
- Adopt microservices architecture, deploy multiple instances for each service
- Use container orchestration tools like Kubernetes for automatic scaling
- Key components: Service registry (such as Eureka), API gateway (such as Zuul), load balancer (such as Nginx)
- Fault tolerance mechanisms: Circuit breaker, degradation, rate limiting, timeout retry
3. Data High Availability (Redundant Data)
- Database cluster solutions: MySQL master-slave replication + MHA, Redis Sentinel mode, MongoDB replica sets
- Data synchronization strategies: Synchronous replication (strong consistency), asynchronous replication (eventual consistency)
Architecture Patterns
Master-Slave Pattern:
- Architecture features: One master node (Master) responsible for all write operations, multiple slave nodes (Slave) synchronize master data and handle read requests
- Advantages: Simple structure, flexible deployment, read-write separation effectively improves system throughput
- Disadvantages: Master node failure requires manual intervention for switch, single write point risk
- Applicable scenarios: Business scenarios with more reads and fewer writes, such as e-commerce product display, news portals
Master-Master Pattern:
- Architecture features: Two nodes are masters to each other, supporting bidirectional data synchronization
- Advantages: Eliminates single point of failure, improves system availability
- Disadvantages: High architecture complexity, high cost for maintaining data consistency
- Applicable scenarios: Systems with high write availability requirements, such as financial trading systems, real-time order systems
Scalability Design
1. Adding Replicas
Implementation method:
- Build database replication environment, synchronize master database data to one or more replicas
- Distribute read requests to replicas through load balancing
Notes:
- Number of replicas should not be too many (generally recommended 3-5)
- There is master-slave synchronization delay issue
2. Database Sharding
Vertical sharding:
- Sharding according to business function or table fields
- Advantages: Reduces single table width, improves query efficiency, reduces I/O pressure
Horizontal sharding:
- Sharding according to a field (such as user ID) range or hash value
- Distribute data across multiple tables or multiple databases
- Advantages: Theoretically can scale infinitely, greatly reduces single table data volume, improves query performance
Consistency Design
1. Not Using Replica Scheme
- Applicable scenarios: Systems with low requirements for read performance, or systems that can accept lower read throughput
- Potential problems: All read and write operations are concentrated on the master database, easily leading to CPU and I/O resource bottlenecks
2. Adding Access Routing Layer Scheme
- Implementation steps:
- Monitor system measures maximum master-slave synchronization delay time t
- Within t time window after data modification, route related queries to master database
- After exceeding t time, queries can be routed to replicas
Master-Slave Pattern Overview
The Master-Slave pattern is a common distributed system architecture pattern that divides nodes in the system into master nodes (Master) and slave nodes (Slave).
Core features:
- Clear role division: Master node is responsible for receiving client requests, performing data processing and decision-making; slave nodes are responsible for replicating master node data, mainly handling read requests
- Data synchronization mechanism: Master node synchronizes to slave nodes through logs (such as binlog) or message queues
Typical application scenarios:
- Database replication: MySQL master-slave replication, Redis master-slave architecture
- Load balancing: Web service clusters
- Distributed computing: Hadoop MapReduce, Spark clusters
Configuration example (MySQL master-slave replication):
-- Master node configuration
[mysqld]
server-id = 1
log_bin = mysql-bin
binlog_format = ROW
-- Slave node configuration
[mysqld]
server-id = 2
relay_log = mysql-relay-bin
read_only = 1
Performance optimization suggestions:
- Network optimization: Ensure low network latency between master and slave nodes (recommended <1ms)
- Batch operations: Reduce frequent small data volume synchronization
- Parallel replication: MySQL 5.7+ supports parallel replication based on group commit
- Monitoring indicators: Focus on key indicators such as
Seconds_Behind_Master
“When building distributed systems, design should be considered from three aspects: availability, scalability, and consistency. Availability is achieved through redundant deployment, multi-site disaster recovery, service circuit breakers and automatic failover; scalability relies on stateless services, read-write separation, database sharding and other means to handle high-concurrency reads and writes; consistency design needs to trade off between strong consistency and eventual consistency, combined with consensus algorithms and synchronization strategies to optimize experience.”
Methods to Avoid Deadlocks
- Maintain consistent locking order
- Try to shorten transaction time
- Reasonably select transaction isolation level
- Use more suitable indexes
- Decompose large transactions
- Explicitly control locking order