TL;DR
Scenario: E-commerce/finance/log distribution need decoupling, peak cutting, reliable delivery and observable consumption.
Conclusion: RocketMQ uses “routing center (NameServer) + storage forwarding (Broker) + client cached routing” to complete high throughput and availability tradeoff. Key lies in routing, flush/replication, consumption offset and retry semantics.
Output: An engineering document covering development context, role responsibilities, deployment topology and full链路 flow for sending/consuming.
RocketMQ
Basic Introduction
RocketMQ’s development is inseparable from Alibaba’s technical evolution. Its predecessor MetaQ was originally named Metamorphosis, a tribute to Kafka’s novella masterpiece, but also暗示 this message middleware would undergo transformation from Kafka derivative to independent development.
In Alibaba’s technical system, MetaQ went through three important stages:
- Initial stage (MetaQ 1.x): Based on Kafka core concepts, reimplemented in Java
- Maturity stage (MetaQ 2.x): Deep optimization for e-commerce scenarios
- Open source stage (MetaQ 3.x/RocketMQ): Officially open sourced and became Apache top project
Technical Selection Considerations:
- Technical stack unification: Alibaba’s technical system mainly Java (over 80%)
- Performance requirements: E-commerce scenarios need higher throughput (peak over million-level TPS)
- Feature extensions: Need to support transaction messages, ordered messages and other financial-grade features
Use Scenarios
Application Decoupling
The higher the system’s coupling, the lower its fault tolerance. Taking e-commerce applications as example, user order creation is a complex business process involving multiple subsystems:
- Order system: Records basic order information
- Inventory system: Deducts product inventory
- Logistics system: Generates distribution orders
- Payment system: Processes payment flow
- Marketing system: Calculates discounts and points
By introducing message queue, these systems can be decoupled, achieving eventual consistency and ensuring core business processes are not interrupted.
Traffic Peak Cutting
Application systems often face challenges from burst traffic impacts. Message queue’s peak cutting capability reflects:
- All requests first enter message queue buffer
- Backend systems consume messages at their own processing pace
- Excess requests queue in buffer, not directly impacting backend
Data Distribution
In modern system architecture, data often needs to flow between multiple systems. Message queue provides publish/subscribe mode, letting systems subscribe to data they need.
Deployment Architecture
Role Introduction
Producer
Producer is the client role responsible for creating and sending messages in message queue system. It encapsulates messages from business systems in specified format and sends them to message queue.
Consumer
Consumer is the client role responsible for receiving and processing messages in message queue system. It gets messages from message queue and performs business processing. Consumer usually works in the form of consumer group.
Broker
Broker is the core component of message queue system, responsible for message storage and forwarding. Its main functions include:
- Message receiving: Receives messages sent by Producer
- Message storage: Persists messages to disk
- Message delivery: Pushes messages to Consumer or waits for Consumer to pull
NameServer
NameServer is a lightweight service discovery component with main functions:
- Broker management: Maintains all Broker registration info and health status
- Routing management: Provides Topic-Broker routing info to Producer and Consumer
- Stateless design: Multiple NameServer instances don’t communicate with each other
Topic
Topic is the logical classification of messages with the following characteristics:
- A Topic contains multiple Message Queue (partitions)
- Producer needs to specify Topic when sending messages
- Consumer needs to specify Topic when subscribing to messages
Message Queue
Message Queue is the partitioned implementation of Topic with the following characteristics:
- A Topic can be divided into multiple Message Queue, enabling parallel processing
- Each Message Queue guarantees FIFO (first-in-first-out) order
Each Role Connection Mechanism
NameServer:
- One of RocketMQ’s core components, as lightweight registry center
- Mainly responsible for managing Broker routing information
- Uses stateless design, nodes are independent of each other, don’t synchronize data
- Recommended to deploy 3-5 NameServer nodes to ensure high availability
Broker:
- Core node for message storage and forwarding
- Deployment architecture uses master-slave mode
- Master node handles all read/write requests
- Slave node mainly used for data backup and disaster recovery
- Broker will immediately establish long connection to all nodes in NameServer cluster after startup
Producer:
- First randomly selects one node in NameServer cluster to establish long connection
- Periodically (default 30 seconds) fetches Topic routing info from NameServer
- Producer design is completely stateless, can easily achieve horizontal scaling
Consumer:
- Connection mechanism similar to Producer, first randomly connects to one NameServer node to get routing info
- Different from Producer, Consumer establishes connections to both Master and Slave Broker
- Uses pull mode to get messages
Producer Message Sending Flow
-
Detailed Workflow:
- Gets Topic routing info from NameServer at initialization
- Maintains two local caches:
topicPublishInfoTable: Topic publish infobrokerAddrTable: Broker address table
- When sending messages:
- Select MessageQueue (round-robin/hash and other strategies)
- Get Master address based on Broker name
- Establish network connection to send message
-
Fault Tolerance Mechanism:
- Default retry 2 times (configurable)
- Failover strategy: Prefer other Broker, then Slave of same Broker
Consumer Message Consumption Flow
-
Detailed Workflow:
- Gets Topic routing info from NameServer at startup
- Uses long polling mechanism (Pull mode)
- Consumption progress management: Maintains offset locally, periodically persists to Broker
-
Load Balancing:
- Reallocates queues every 20 seconds (configurable)
- Supports multiple allocation strategies
-
Consumption Modes:
- Cluster mode (CLUSTERING): Each message only consumed by one consumer
- Broadcast mode (BROADCASTING): Each message consumed by all consumers
Error Quick Reference
| Symptom | Root Cause | Diagnosis | Fix |
|---|---|---|---|
| Send error NO_ROUTE_INFO_OF_THIS_TOPIC | Topic not created/not registered to NameServer | mqadmin query Topic routing | Production explicitly create Topic |
| Producer send timeout / RT spike | Broker load high, slow disk flush, network jitter | Broker disk/IO, thread pool backlog | Optimize flush/replication strategy and disk |
| Consumption backlog continues growing | Insufficient consumption capacity, consumer thread blocked | Consumption TPS, backlog, consumption time | Increase concurrency/batch consumption |
| Duplicate consumption/idempotency issues | At-least-once delivery semantics + retry | Consumption logs: same key multiple times | Business side idempotency |
| Broker starts but NameServer has no routing | Broker couldn’t connect to all NameServers | Broker registration logs | Verify NameServer address list |