TL;DR

  • Scenario: Using RocketMQ for order, payment, inventory and other distributed chains, focusing on consistency, reliability, recoverability and pressure resistance
  • Conclusion: Key capabilities focus on ordered consumption, Broker-side filtering, transaction messages, delayed delivery, retry/dead letter and flow control boundaries
  • Output: A structured explanation that can be directly put into “RocketMQ Features” chapter + default value/version sensitive points and error quick reference

RocketMQ Features

Subscribe Publish

Message publishing refers to a producer sending messages to a certain Topic. Message subscription refers to a consumer following a certain Topic with certain Tags.

Message Ordering

Message ordering means when consuming a type of messages, they can be consumed in the order they were sent. For example: an order generates three messages respectively being order created, order paid, order completed. Consumption must be in this order to be meaningful. But orders can be consumed in parallel between each other. RocketMQ can strictly guarantee message ordering.

Message Filtering

RocketMQ consumers can filter messages by Tag, also supports custom attribute filtering. Message filtering is currently implemented at Broker side. Advantages: reduces useless network transmission for Consumer. Disadvantages: increases Broker burden and implementation is relatively complex.

Message Reliability

RocketMQ supports high message reliability. Several situations affecting message reliability:

  1. Broker abnormal shutdown
  2. Broker abnormal crash
  3. OS abnormal crash
  4. Machine power failure, but can immediately restore power
  5. Machine cannot boot
  6. Disk device damage

Situations 1-4 are hardware resources that can be recovered. RocketMQ can guarantee no message loss in these four situations, or only a small amount of data loss. Situations 5-6 are single point failures and cannot be recovered.

RocketMQ guarantees 99% message reliability through async replication, but there may still be extremely few messages that could be lost. Synchronous dual-write technology can completely avoid single points.

At Least Once

At Least Once means each message must be delivered once. Consumer first pulls message to local, only after message processing is complete does it return ACK to server. If not consumed, message will not be ACK’d. So RocketMQ can well support this feature.

Traceback Consumption

Traceback consumption refers to when Consumer has successfully consumed a message, but due to business requirements need to re-consume, Broker provides a message replay mechanism.

Transaction Message

RocketMQ transaction message (Transactional Message) is a special message mechanism that can bind application local transaction and message sending operation into a global transaction, ensuring these two operations either both succeed or both fail.

Working Principle

  1. Half Message Stage:

    • Producer sends “half message” (PREPARED state) to Broker
    • Broker stores message in special queue, not visible to consumer at this time
  2. Local Transaction Execution:

    • Producer executes local transaction
    • Commit or rollback transaction message based on local transaction result
  3. Transaction State Confirmation:

    • Success: Broker transforms message to consumable state (COMMIT_MESSAGE)
    • Failure: Broker discards the message (ROLLBACK_MESSAGE)
  4. State Callback Mechanism:

    • If producer doesn’t confirm state in time, Broker actively queries transaction state
    • Default callback count is 15, interval is configurable

Scheduled Message

Scheduled message (delay queue) means after message is sent to Broker, it won’t be consumed immediately, waiting for specific time to deliver to real Topic.

Broker has configuration item MessageDelayLevel, default value:

  • 1s, 5s, 10s, 30s
  • 1m, 2m, 3m, 5m, 6m, 7m, 8m, 9m, 10m
  • 20m, 30m, 1h, 2h

18 levels in total.

Message Retry

After Consumer fails to consume message, need to provide a retry mechanism so message can be consumed again. Consumer message consumption failure can usually be considered in the following situations:

  1. Due to message itself, e.g., deserialization failure
  2. Due to unavailable downstream application service, e.g., db connection unavailable

Message Redelivery

When producer sends message:

  • Synchronous message failure will redeliver
  • Async messages have retry
  • Oneway has no guarantee

Message redelivery ensures message is sent as successfully as possible, not lost, but may cause message duplication.

Flow Control

  • Producer flow control: Because Broker processing capacity reaches bottleneck
  • Consumer flow control: Because consumer capacity reaches bottleneck

Producer Flow Control

  • When commitLog file is locked longer than osPageCacheBusyTimesOutMills, flow control occurs
  • Broker implements flow control by refusing send requests

Consumer Flow Control

  • When consumer local cached messages exceed pullThresholdForQueue, default 1000
  • When consumer local cached message size exceeds pullThresholdSizeForQueue, default 100MB

Dead Letter Queue

Dead letter queue is an important mechanism in message queue system for handling messages that cannot be normally consumed.

  1. When consumer fails to process a message for the first time, message queue automatically marks that message as “retry state”
  2. Message queue will perform multiple retries based on configured retry strategy
  3. When still fails after reaching preset maximum retry count, system determines that message as “dead letter message”
  4. In RocketMQ dead letter queue naming rule: %DLQ% + consumer group name

Error Quick Reference

SymptomRoot CauseDiagnosisFix
Order messages not consumed in orderSame business key not routed to same queue/ordered consumption not enabledCheck producer routing, consumer ordered modeFixed route to same queue, use ordered consumption
Tag filtering not taking effectSubscription expression/Tag writing errorCompare Producer Tag with subscription expressionStandardize Tag, unify Topic semantics
Crazy retry after consumption failureUnrecoverable failure type but using fast retryCheck failure stack type, retry intervalUnrecoverable errors directly discard
Duplicate consumption causing data chaosProducer redelivery/network jitter causing duplicate deliveryCount same business key duplicate timesBusiness side idempotency
DLQ backlog growing rapidlyMax retry count exhausted still failingCheck DLQ message volume, failure typeEstablish DLQ processing job
Transaction message stuck longLocal transaction not correctly committed/rolled backCheck transaction listener return valueGuarantee local transaction and state determinable