TL;DR

Scenario: E-commerce/finance/log distribution need decoupling, peak cutting, reliable delivery and observable consumption.

Conclusion: RocketMQ uses “routing center (NameServer) + storage forwarding (Broker) + client cached routing” to complete high throughput and availability tradeoff. Key lies in routing, flush/replication, consumption offset and retry semantics.

Output: An engineering document covering development context, role responsibilities, deployment topology and full链路 flow for sending/consuming.


RocketMQ

Basic Introduction

RocketMQ’s development is inseparable from Alibaba’s technical evolution. Its predecessor MetaQ was originally named Metamorphosis, a tribute to Kafka’s novella masterpiece, but also暗示 this message middleware would undergo transformation from Kafka derivative to independent development.

In Alibaba’s technical system, MetaQ went through three important stages:

  1. Initial stage (MetaQ 1.x): Based on Kafka core concepts, reimplemented in Java
  2. Maturity stage (MetaQ 2.x): Deep optimization for e-commerce scenarios
  3. Open source stage (MetaQ 3.x/RocketMQ): Officially open sourced and became Apache top project

Technical Selection Considerations:

  • Technical stack unification: Alibaba’s technical system mainly Java (over 80%)
  • Performance requirements: E-commerce scenarios need higher throughput (peak over million-level TPS)
  • Feature extensions: Need to support transaction messages, ordered messages and other financial-grade features

Use Scenarios

Application Decoupling

The higher the system’s coupling, the lower its fault tolerance. Taking e-commerce applications as example, user order creation is a complex business process involving multiple subsystems:

  1. Order system: Records basic order information
  2. Inventory system: Deducts product inventory
  3. Logistics system: Generates distribution orders
  4. Payment system: Processes payment flow
  5. Marketing system: Calculates discounts and points

By introducing message queue, these systems can be decoupled, achieving eventual consistency and ensuring core business processes are not interrupted.

Traffic Peak Cutting

Application systems often face challenges from burst traffic impacts. Message queue’s peak cutting capability reflects:

  1. All requests first enter message queue buffer
  2. Backend systems consume messages at their own processing pace
  3. Excess requests queue in buffer, not directly impacting backend

Data Distribution

In modern system architecture, data often needs to flow between multiple systems. Message queue provides publish/subscribe mode, letting systems subscribe to data they need.


Deployment Architecture

Role Introduction

Producer

Producer is the client role responsible for creating and sending messages in message queue system. It encapsulates messages from business systems in specified format and sends them to message queue.

Consumer

Consumer is the client role responsible for receiving and processing messages in message queue system. It gets messages from message queue and performs business processing. Consumer usually works in the form of consumer group.

Broker

Broker is the core component of message queue system, responsible for message storage and forwarding. Its main functions include:

  1. Message receiving: Receives messages sent by Producer
  2. Message storage: Persists messages to disk
  3. Message delivery: Pushes messages to Consumer or waits for Consumer to pull

NameServer

NameServer is a lightweight service discovery component with main functions:

  • Broker management: Maintains all Broker registration info and health status
  • Routing management: Provides Topic-Broker routing info to Producer and Consumer
  • Stateless design: Multiple NameServer instances don’t communicate with each other

Topic

Topic is the logical classification of messages with the following characteristics:

  • A Topic contains multiple Message Queue (partitions)
  • Producer needs to specify Topic when sending messages
  • Consumer needs to specify Topic when subscribing to messages

Message Queue

Message Queue is the partitioned implementation of Topic with the following characteristics:

  • A Topic can be divided into multiple Message Queue, enabling parallel processing
  • Each Message Queue guarantees FIFO (first-in-first-out) order

Each Role Connection Mechanism

NameServer:

  • One of RocketMQ’s core components, as lightweight registry center
  • Mainly responsible for managing Broker routing information
  • Uses stateless design, nodes are independent of each other, don’t synchronize data
  • Recommended to deploy 3-5 NameServer nodes to ensure high availability

Broker:

  • Core node for message storage and forwarding
  • Deployment architecture uses master-slave mode
  • Master node handles all read/write requests
  • Slave node mainly used for data backup and disaster recovery
  • Broker will immediately establish long connection to all nodes in NameServer cluster after startup

Producer:

  • First randomly selects one node in NameServer cluster to establish long connection
  • Periodically (default 30 seconds) fetches Topic routing info from NameServer
  • Producer design is completely stateless, can easily achieve horizontal scaling

Consumer:

  • Connection mechanism similar to Producer, first randomly connects to one NameServer node to get routing info
  • Different from Producer, Consumer establishes connections to both Master and Slave Broker
  • Uses pull mode to get messages

Producer Message Sending Flow

  1. Detailed Workflow:

    • Gets Topic routing info from NameServer at initialization
    • Maintains two local caches:
      • topicPublishInfoTable: Topic publish info
      • brokerAddrTable: Broker address table
    • When sending messages:
      1. Select MessageQueue (round-robin/hash and other strategies)
      2. Get Master address based on Broker name
      3. Establish network connection to send message
  2. Fault Tolerance Mechanism:

    • Default retry 2 times (configurable)
    • Failover strategy: Prefer other Broker, then Slave of same Broker

Consumer Message Consumption Flow

  1. Detailed Workflow:

    • Gets Topic routing info from NameServer at startup
    • Uses long polling mechanism (Pull mode)
    • Consumption progress management: Maintains offset locally, periodically persists to Broker
  2. Load Balancing:

    • Reallocates queues every 20 seconds (configurable)
    • Supports multiple allocation strategies
  3. Consumption Modes:

    • Cluster mode (CLUSTERING): Each message only consumed by one consumer
    • Broadcast mode (BROADCASTING): Each message consumed by all consumers

Error Quick Reference

SymptomRoot CauseDiagnosisFix
Send error NO_ROUTE_INFO_OF_THIS_TOPICTopic not created/not registered to NameServermqadmin query Topic routingProduction explicitly create Topic
Producer send timeout / RT spikeBroker load high, slow disk flush, network jitterBroker disk/IO, thread pool backlogOptimize flush/replication strategy and disk
Consumption backlog continues growingInsufficient consumption capacity, consumer thread blockedConsumption TPS, backlog, consumption timeIncrease concurrency/batch consumption
Duplicate consumption/idempotency issuesAt-least-once delivery semantics + retryConsumption logs: same key multiple timesBusiness side idempotency
Broker starts but NameServer has no routingBroker couldn’t connect to all NameServersBroker registration logsVerify NameServer address list