TL;DR
- Scenario: Java service integrating with Memcached needs to understand Spymemcached’s thread model, sharding routing and serialization details
- Conclusion: Spymemcached implements async IO via NIO+callback, uses ketama consistent hashing for Sharding
- Output: Complete architecture breakdown, thread and Sharding mechanism explanation, and error quick reference
Spymemcached Introduction
Spymemcached is a memcached client implemented using NIO.
Protocol Support:
- Text Protocol: Pure text-based protocol, easy to debug and read
- Binary Protocol: Binary protocol, higher transmission efficiency
Async Communication Mechanism: Uses NIO (Non-blocking I/O) for efficient network communication, handles responses through callback mechanism.
Cluster and Sharding: Supports Sharding mechanism, uses consistent hashing algorithm to allocate keys, supports dynamic node addition and removal.
Overall Design Architecture
- API Interface Design: Provides synchronous and asynchronous calling methods
- Task Encapsulation Mechanism: Encapsulates requests as independent Task objects
- Routing Partition Strategy: Uses consistent hashing algorithm to determine target node
- Task Queue Management: Each connection maintains independent task queue
- Async IO Processing Flow: Selector monitors IO events for all connections
- Response Processing Mechanism: Matches original Task based on opaque field, executes callback
Thread Design
Spymemcached has two types of threads: business threads and Selector threads.
Business threads are responsible for:
- Request encapsulation
- Object serialization
- Protocol encapsulation
- Task distribution
Selector threads are responsible for:
- Send processing: Queue polling, data sending, flow control
- Receive processing: Response reading, result notification
- Connection management: Fault detection, automatic recovery, load balancing
Sharding Mechanism
Routing Mechanism
Two hashing algorithms:
-
arrayMod (Array Modulo Hashing)
- Calculation: hash(key) % node count = target node index
- Feature: Node addition/removal causes large-scale data migration
-
ketama (Consistent Hashing)
- Generates 160 virtual nodes for each physical node
- Builds hash ring, evenly distributes virtual nodes
- Advantage: Only 1/N data needs migration on average when nodes are added/removed
Fault Tolerance
Three Failover handling strategies:
- Redistribute strategy (recommended): Round-robin/randomly select next available node
- Retry strategy: Max retry 3 times, linearly increasing retry interval
- Cancel strategy: Log error, throw ConnectionException
Serialization
Objects are stored in Memcached in binary form. After serialization, if length exceeds threshold of 16384 bytes, GZip compression is applied.
Error Quick Reference
| Symptom | Root Cause | Fix |
|---|---|---|
| Many get timeouts under high concurrency | Single node pressure too high, Selector thread can’t keep up | Reduce single node pressure, increase timeout config |
| Cache hit rate drops to 0 after node add/remove | Using arrayMod sharding | Switch to ketama consistent hashing |
| Many connection exceptions after a node fails | Failover strategy misconfigured | Limit max retry count |
| Client process memory continuously grows after QPS increases | Task queue, Future or callback holds large object references | Set reasonable upper limit for queue |
| Some keys occasionally return NOT_FOUND | Inconsistent sharding routing | Fix node list order and weights |
| CPU spikes when writing large objects | Serialization + GZip compression overhead | Adjust compression threshold, avoid super large objects in cache |