Design Philosophy

When building distributed cluster architecture, systematic consideration is needed from three key dimensions:

1. Availability

Availability refers to the system’s ability to provide normal services within a specified time, usually measured by “SLA” (Service Level Agreement) such as “99.99%” annual availability.

Common strategies for achieving high availability include:

  • Redundancy design: Deploy multiple instances to avoid single point of failure
  • Failover: Automatically detect and switch faulty nodes
  • Graceful degradation: Still provide basic services when some components fail
  • Disaster recovery design: Cross-datacenter/cross-region deployment

2. Scalability

Scalability refers to the system’s ability to handle load growth, divided into:

  • Vertical scaling (Scale Up): Improve single-machine performance
  • Horizontal scaling (Scale Out): Increase number of servers

Design principles:

  • Stateless design: Facilitates horizontal scaling
  • Data sharding: Such as database sharding
  • Read-write separation: Reduce master database pressure
  • Microservices architecture: Independent scaling by functional modules

3. Consistency

Consistency refers to the degree to which data remains synchronized across nodes in a distributed system. According to business requirements, you can choose:

Consistency models:

  • Strong consistency (CP): All node data completely synchronized
  • Weak consistency (AP): Prioritize availability, allow brief inconsistency
  • Eventual consistency: Becomes consistent after a period of time

Detailed Analysis

Availability

1. Site High Availability (Redundant Sites)

  • Adopt multi-datacenter deployment strategy, deploy mirror sites in different geographic locations (such as East China, North China, South China)
  • Achieve traffic distribution through DNS polling or Global Server Load Balancing (GSLB)

2. Service High Availability (Redundant Services)

  • Adopt microservices architecture, deploy multiple instances for each service
  • Use container orchestration tools like Kubernetes for automatic scaling
  • Key components: Service registry (such as Eureka), API gateway (such as Zuul), load balancer (such as Nginx)
  • Fault tolerance mechanisms: Circuit breaker, degradation, rate limiting, timeout retry

3. Data High Availability (Redundant Data)

  • Database cluster solutions: MySQL master-slave replication + MHA, Redis Sentinel mode, MongoDB replica sets
  • Data synchronization strategies: Synchronous replication (strong consistency), asynchronous replication (eventual consistency)

Architecture Patterns

Master-Slave Pattern:

  • Architecture features: One master node (Master) responsible for all write operations, multiple slave nodes (Slave) synchronize master data and handle read requests
  • Advantages: Simple structure, flexible deployment, read-write separation effectively improves system throughput
  • Disadvantages: Master node failure requires manual intervention for switch, single write point risk
  • Applicable scenarios: Business scenarios with more reads and fewer writes, such as e-commerce product display, news portals

Master-Master Pattern:

  • Architecture features: Two nodes are masters to each other, supporting bidirectional data synchronization
  • Advantages: Eliminates single point of failure, improves system availability
  • Disadvantages: High architecture complexity, high cost for maintaining data consistency
  • Applicable scenarios: Systems with high write availability requirements, such as financial trading systems, real-time order systems

Scalability Design

1. Adding Replicas

Implementation method:

  • Build database replication environment, synchronize master database data to one or more replicas
  • Distribute read requests to replicas through load balancing

Notes:

  • Number of replicas should not be too many (generally recommended 3-5)
  • There is master-slave synchronization delay issue

2. Database Sharding

Vertical sharding:

  • Sharding according to business function or table fields
  • Advantages: Reduces single table width, improves query efficiency, reduces I/O pressure

Horizontal sharding:

  • Sharding according to a field (such as user ID) range or hash value
  • Distribute data across multiple tables or multiple databases
  • Advantages: Theoretically can scale infinitely, greatly reduces single table data volume, improves query performance

Consistency Design

1. Not Using Replica Scheme

  • Applicable scenarios: Systems with low requirements for read performance, or systems that can accept lower read throughput
  • Potential problems: All read and write operations are concentrated on the master database, easily leading to CPU and I/O resource bottlenecks

2. Adding Access Routing Layer Scheme

  • Implementation steps:
    1. Monitor system measures maximum master-slave synchronization delay time t
    2. Within t time window after data modification, route related queries to master database
    3. After exceeding t time, queries can be routed to replicas

Master-Slave Pattern Overview

The Master-Slave pattern is a common distributed system architecture pattern that divides nodes in the system into master nodes (Master) and slave nodes (Slave).

Core features:

  1. Clear role division: Master node is responsible for receiving client requests, performing data processing and decision-making; slave nodes are responsible for replicating master node data, mainly handling read requests
  2. Data synchronization mechanism: Master node synchronizes to slave nodes through logs (such as binlog) or message queues

Typical application scenarios:

  • Database replication: MySQL master-slave replication, Redis master-slave architecture
  • Load balancing: Web service clusters
  • Distributed computing: Hadoop MapReduce, Spark clusters

Configuration example (MySQL master-slave replication):

-- Master node configuration
[mysqld]
server-id = 1
log_bin = mysql-bin
binlog_format = ROW

-- Slave node configuration
[mysqld]
server-id = 2
relay_log = mysql-relay-bin
read_only = 1

Performance optimization suggestions:

  1. Network optimization: Ensure low network latency between master and slave nodes (recommended <1ms)
  2. Batch operations: Reduce frequent small data volume synchronization
  3. Parallel replication: MySQL 5.7+ supports parallel replication based on group commit
  4. Monitoring indicators: Focus on key indicators such as Seconds_Behind_Master

“When building distributed systems, design should be considered from three aspects: availability, scalability, and consistency. Availability is achieved through redundant deployment, multi-site disaster recovery, service circuit breakers and automatic failover; scalability relies on stateless services, read-write separation, database sharding and other means to handle high-concurrency reads and writes; consistency design needs to trade off between strong consistency and eventual consistency, combined with consensus algorithms and synchronization strategies to optimize experience.”

Methods to Avoid Deadlocks

  1. Maintain consistent locking order
  2. Try to shorten transaction time
  3. Reasonably select transaction isolation level
  4. Use more suitable indexes
  5. Decompose large transactions
  6. Explicitly control locking order