Primary Key Strategies
In many small projects, we often use the database’s auto-increment feature to generate primary key IDs, which is indeed simple. However, in a sharded database environment, we can no longer rely on the database’s auto-increment feature to generate IDs directly, as it would cause primary key duplication across different data tables.
UUID (Universally Unique Identifier)
Basic Concepts
UUID is a 128-bit (16-byte) numeric identifier used to uniquely identify information in distributed systems. Its standard format consists of 32 hexadecimal digits, separated by hyphens into 5 groups in the form 8-4-4-4-12, with a total length of 36 characters (e.g., 550e8400-e29b-41d4-a716-446655440000).
Generation Mechanism
UUID generation typically combines multiple system information to ensure uniqueness:
- Network hardware information (e.g., MAC address)
- High-precision timestamp (nanosecond level)
- Hardware chip ID
- Random number generator
- Namespace (in specific versions)
Common versions include:
- Version 1: Based on timestamp and MAC address
- Version 4: Based on random number generation
- Version 5: Based on namespace and hash value
Database Application Characteristics
Advantages
- Distributed generation: No central server coordination needed, each node can generate independently
- Uniqueness guarantee: Extremely low probability of duplication in theory (about 1 in 10^38)
- No network overhead: Local generation without network requests
- Security: Does not expose business information (compared to auto-increment ID)
Disadvantages
- Storage overhead: Occupies 16 bytes, twice that of BIGINT
- Index efficiency: In InnoDB engine, due to:
- Unordered nature causes frequent B+ tree split and reorganization
- Secondary indexes need to store complete primary key values
- Increases memory usage and IO operations
- Poor readability: Difficult for humans to intuitively remember and identify
Special Impact on InnoDB Engine
- Clustered index issues: InnoDB uses primary key as clustered index; UUID unordered insertion causes:
- Increased page split frequency
- Data storage fragmentation
- Decreased cache hit rate
- Secondary index bloat: Each secondary index contains the primary key value; 16-byte UUID causes:
- Increased index file size
- Fewer indexes cached in memory buffer pool
- Need to load more data pages during range queries
Optimization Solutions
For scenarios requiring UUID:
- Use ordered UUID variants (e.g., COMB UUID)
- Store UUID as binary(16)
- Establish auto-increment ID as clustered index, use UUID as business key
- Consider short hash versions of UUID (need to evaluate collision probability)
Recommended Application Scenarios
Suitable scenarios for UUID:
- Distributed systems requiring offline generation
- Businesses needing to know primary key values in advance
- Scenarios requiring hidden data scale
- Multi-system data merge scenarios
COMB (UUID Variant) Details
Basic Concepts
COMB (combine) is a unique design concept in the database field, an improved GUID/UUID implementation. This design significantly improves database index and retrieval performance by combining traditional GUID/UUID with system time information.
Technical Background
Standard databases do not have a native COMB data type; this concept was first proposed and elaborated by Jimmy Nilsson in his technical article “The Cost of GUIDs as Primary Keys”.
Design Principles
COMB design is based on the following technical considerations:
- Traditional GUID/UUID is a completely random 128-bit identifier
- This randomness causes serious index fragmentation in databases
- Random distribution during data insertion leads to low index efficiency, affecting overall system performance
Specific Implementation
COMB reconstructs GUID using segmented combination:
- Retained portion: Keep the first 10 bytes (80 bits) of GUID unchanged to ensure uniqueness
- Time portion: Use last 6 bytes (48 bits) to store timestamp of GUID generation (DateTime)
- Precise to millisecond-level time information
- Timestamp stored in big-endian order
Performance Advantages
This combination brings significant advantages:
- Uniqueness preserved: First 10 bytes still guarantee global uniqueness
- Added ordering: Timestamp makes newly generated IDs show an increasing trend
- Index optimization:
- Reduced index fragmentation
- Improved range query efficiency
- Optimized data page fill rate
Typical Application Scenarios
- Distributed database primary key design
- High-concurrency order systems
- Log systems requiring frequent insertion
- Large e-commerce platform product ID generation
SnowFlake Distributed ID Generation Algorithm
In distributed systems, we often need a globally unique and time-ordered ID generation solution. SnowFlake is precisely a distributed ID generation algorithm open-sourced by Twitter to solve this problem. The IDs it generates are 64-bit long integers.
Data Structure Analysis
SnowFlake’s 64-bit ID consists of:
- Sign bit (1bit): Always 0, ensuring generated IDs are positive
- Timestamp section (41bit):
- Records timestamp when ID was generated (millisecond level)
- 41 bits can represent time span of approximately 69 years (2^41/1000/60/60/24/365)
- Usually calculated from system launch time, e.g., 2020-01-01 00:00:00
- Worker machine ID (10bit):
- High 5 bits represent datacenter ID (supports up to 32 datacenters)
- Low 5 bits represent machine ID (supports up to 32 machines per datacenter)
- This design can support up to 1024 machines (32*32)
- Sequence number (12bit):
- Sequence number for different IDs generated within the same millisecond
- 12 bits support 4096 IDs per node per millisecond (2^12)
Workflow
- When receiving an ID generation request, first get current timestamp
- If current timestamp is less than the timestamp of the last generated ID, system clock rollback is detected and an exception is thrown
- If requests are within the same millisecond, increment sequence number
- If sequence number overflows, wait until next millisecond
- Finally, combine each part’s value through bitwise operations to form the final 64-bit ID
Application Scenarios
- Distributed systems: As globally unique transaction IDs
- Database primary keys: Replace auto-increment IDs to avoid ID conflicts during sharding
- Message queues: As unique message identifiers
- Log tracking: As request chain tracking IDs
Advantages and Limitations
Advantages:
- ID self-incrementing trend benefits database indexes
- No dependency on third-party services, local generation
- High performance, can generate millions of IDs per second per machine
Limitations:
- Depends on system clock, clock rollback causes ID duplication
- Worker machine IDs need pre-configuration, not conducive to dynamic expansion
Database ID Table (Distributed ID Generation)
Core Principle
Maintain an independent database table dedicated to generating globally unique IDs, using MySQL’s auto-increment ID feature to implement ID generation in a distributed environment.
Implementation Details
-
Independent ID database construction
- Create a separate MySQL database instance (e.g., named
id_generator_db) - Create a table dedicated to ID generation in this database (e.g.,
global_id_table)
- Create a separate MySQL database instance (e.g., named
-
ID generation process
- When business system needs an ID, execute the following SQL:
- After obtaining the ID, it can be used for inserting into business tables
-
Sharded table scenario application
- Taking Table A sharding as an example:
- First get global ID from
global_id_table - Based on sharding rules (e.g., ID modulo) decide whether to insert into A1 or A2 table
- First get global ID from
- Taking Table A sharding as an example:
Optimization and Precautions
- Performance optimization
- Use connection pool to manage ID database connections
- Batch ID acquisition: Allocate ID segments in batches by setting
auto_increment_incrementparameter
- High availability solution
- Deploy master-slave architecture to avoid single point of failure
- Consider multi-datacenter deployment of ID generation service
- Usage limitations
- Single database throughput is limited (about 10k-20k QPS)
- Cross-datacenter calls may cause network latency
- Need to pay attention to auto-increment ID overflow issues (use bigint type)
Redis ID Generation
Background and Requirements
In distributed systems, generating globally unique IDs is a common requirement. Traditional database auto-increment IDs may face performance bottlenecks under high concurrency.
Advantages of Redis Solution
Redis, as an in-memory database, has the following characteristics that make it suitable for ID generation:
- Single-threaded model ensures atomic operations
- High performance (100k+ QPS)
- Supports persistence, ensuring data security
Implementation Methods
1. Basic INCR Command
INCR id_counter
- Each execution automatically increments the key value by 1
- Returns the new integer value as ID
2. Batch ID Generation (INCRBY)
INCRBY id_counter 1000
- Acquire a range of IDs at once
- Suitable for batch operation scenarios
3. Timestamp Combination Mode
INCR daily_counter
- Generate format: date (8 digits) + auto-increment sequence (6 digits)
- For example: 20230515-000001
- Counter resets automatically daily
Application Scenarios
- Order system: Generate unique order numbers
- Log system: Mark each log with unique ID
- Distributed locks: Implement lock mechanism based on ID
- Message queues: Unique message identifier
Performance Comparison
| Solution | QPS | Pros | Cons |
|---|---|---|---|
| Database auto-increment ID | 1k-5k | Simple and reliable | Limited performance |
| Redis INCR | 50k+ | High performance, atomic | Requires Redis service maintenance |
| UUID | 100k+ | No central node needed | Long ID, unordered |