MySQL Scaling Guide: Triggers, Migration, and Performance...

Scaling Triggers

Capacity indicator: Disk usage exceeds 80% and expected to reach upper limit within 3 months
Performance indicator: Query response time consistently exceeds SLA threshold (e.g., >500ms)
Concurrency indicator: Active connections consistently maintained above 70% of maximum connections
Monitoring alerts: Periodic CPU/IO saturation alerts

Horizontal Scaling Implementation Steps

1. Evaluation and Planning Phase

Conduct capacity assessment, calculate current data growth curve
Determine scaling ratio (e.g., increase 50% node count)
Select scaling strategy: consistent hash scaling or range scaling

2. Data Migration Plan

Plan A: Online Migration (Recommended)

Deploy new nodes and add to cluster
Configure data synchronization mechanism (e.g., MySQL GTID replication)
Migrate hot data in batches
Switch traffic and verify

Plan B: Downtime Migration

Stop write services
Full backup of existing data
Redistribute data to new and old nodes
Restore services

3. Sharding Strategy Adjustment

Refactor sharding key algorithm
Update routing configuration (e.g., MyCat/ShardingSphere configuration)
Test data balance

4. Application Layer Adaptation

Update data source configuration
Adjust connection pool parameters
Modify potentially affected SQL statements

Common Challenges and Solutions

Data skew issue:
- Case: An e-commerce platform’s user table sharded by ID hash caused some shards to have 3x more data than others
- Solution: Use composite sharding keys (e.g., ID + registration time)
Cross-shard transactions:
- Introduce distributed transaction framework (e.g., Seata)
- Or adopt eventual consistency model
Scaling cost control:
- Use hybrid deployment strategy (SSD+HDD hybrid storage)
- Implement hot-cold data separation

Best Practices Case Study

Scaling experience from a social platform when daily active users exceeded 10 million:

Scaled from 8 shards to 12 shards
Used online migration method, took 72 hours
QPS decrease controlled within 15% during migration
After scaling, TP99 latency reduced from 800ms to 300ms

Downtime Scaling

Overview

Downtime scaling is a common approach in early database architecture evolution, suitable for scenarios where database size is relatively small and brief service interruptions are acceptable.

Detailed Implementation Steps

Service announcement phase: Publish maintenance notice 3-5 days before scaling
Service stop phase: Close load balancer traffic entry, stop all application service processes
Data migration phase: Add new database instances, write migration scripts to change sharding rules
Configuration update phase: Update database connection pool configuration, adjust sharding routing logic
Service recovery phase: Start database services first, then start application services

Pros and Cons

Advantages:

Simple and direct implementation, low technical difficulty
No complex data synchronization mechanism needed
One-time architecture adjustment completed

Limitations:

Must operate with downtime, affecting business continuity
Migration time increases significantly as data volume grows
Not suitable for businesses requiring 7x24 high availability

Applicable Scenarios

Early database expansion for startups
Internal management system upgrades
ToB services that can accept scheduled maintenance
Migrations under TB-level data volume

Smooth Scaling

Overview

The core idea of smooth scaling is to adopt a gradual doubling strategy, incrementally increasing the number of databases through staged operations while keeping services uninterrupted.

Plan Characteristics

Scaling ratio: Adopt 2x scaling strategy (e.g., expand from 2 DB nodes to 4 nodes)
Technical requirements: Rely on dual-master replication mechanism for data synchronization
Implementation phases: Divided into testing verification and production deployment two main phases

Detailed Implementation Steps

Infrastructure preparation: Apply for new nodes, ensure same VPC as existing cluster
Test environment verification: Build test cluster, verify data consistency
Production environment deployment: Rolling deployment, configure middleware to gradually migrate traffic
Ongoing maintenance: Deploy monitoring, perform regular maintenance

Advantages

Business continuity guarantee: Services remain available during scaling
Team pressure relief: More relaxed time window, allows phased execution
Risk control advantage: Real-time monitoring, handle issues immediately
Performance optimization effect: Reduced single database data volume brings significant performance improvement

Disadvantages

High program complexity: Need to configure dual-master sync, dual-master dual-write
High scaling cost: Operations cost surges, sync links grow quadratically

Applicable Scenarios

Large websites
Services with high availability requirements