TL;DR
- Scenario: Using RocketMQ for order, payment, inventory and other distributed chains, focusing on consistency, reliability, recoverability and pressure resistance
- Conclusion: Key capabilities focus on ordered consumption, Broker-side filtering, transaction messages, delayed delivery, retry/dead letter and flow control boundaries
- Output: A structured explanation that can be directly put into “RocketMQ Features” chapter + default value/version sensitive points and error quick reference
RocketMQ Features
Subscribe Publish
Message publishing refers to a producer sending messages to a certain Topic. Message subscription refers to a consumer following a certain Topic with certain Tags.
Message Ordering
Message ordering means when consuming a type of messages, they can be consumed in the order they were sent. For example: an order generates three messages respectively being order created, order paid, order completed. Consumption must be in this order to be meaningful. But orders can be consumed in parallel between each other. RocketMQ can strictly guarantee message ordering.
Message Filtering
RocketMQ consumers can filter messages by Tag, also supports custom attribute filtering. Message filtering is currently implemented at Broker side. Advantages: reduces useless network transmission for Consumer. Disadvantages: increases Broker burden and implementation is relatively complex.
Message Reliability
RocketMQ supports high message reliability. Several situations affecting message reliability:
- Broker abnormal shutdown
- Broker abnormal crash
- OS abnormal crash
- Machine power failure, but can immediately restore power
- Machine cannot boot
- Disk device damage
Situations 1-4 are hardware resources that can be recovered. RocketMQ can guarantee no message loss in these four situations, or only a small amount of data loss. Situations 5-6 are single point failures and cannot be recovered.
RocketMQ guarantees 99% message reliability through async replication, but there may still be extremely few messages that could be lost. Synchronous dual-write technology can completely avoid single points.
At Least Once
At Least Once means each message must be delivered once. Consumer first pulls message to local, only after message processing is complete does it return ACK to server. If not consumed, message will not be ACK’d. So RocketMQ can well support this feature.
Traceback Consumption
Traceback consumption refers to when Consumer has successfully consumed a message, but due to business requirements need to re-consume, Broker provides a message replay mechanism.
Transaction Message
RocketMQ transaction message (Transactional Message) is a special message mechanism that can bind application local transaction and message sending operation into a global transaction, ensuring these two operations either both succeed or both fail.
Working Principle
-
Half Message Stage:
- Producer sends “half message” (PREPARED state) to Broker
- Broker stores message in special queue, not visible to consumer at this time
-
Local Transaction Execution:
- Producer executes local transaction
- Commit or rollback transaction message based on local transaction result
-
Transaction State Confirmation:
- Success: Broker transforms message to consumable state (COMMIT_MESSAGE)
- Failure: Broker discards the message (ROLLBACK_MESSAGE)
-
State Callback Mechanism:
- If producer doesn’t confirm state in time, Broker actively queries transaction state
- Default callback count is 15, interval is configurable
Scheduled Message
Scheduled message (delay queue) means after message is sent to Broker, it won’t be consumed immediately, waiting for specific time to deliver to real Topic.
Broker has configuration item MessageDelayLevel, default value:
- 1s, 5s, 10s, 30s
- 1m, 2m, 3m, 5m, 6m, 7m, 8m, 9m, 10m
- 20m, 30m, 1h, 2h
18 levels in total.
Message Retry
After Consumer fails to consume message, need to provide a retry mechanism so message can be consumed again. Consumer message consumption failure can usually be considered in the following situations:
- Due to message itself, e.g., deserialization failure
- Due to unavailable downstream application service, e.g., db connection unavailable
Message Redelivery
When producer sends message:
- Synchronous message failure will redeliver
- Async messages have retry
- Oneway has no guarantee
Message redelivery ensures message is sent as successfully as possible, not lost, but may cause message duplication.
Flow Control
- Producer flow control: Because Broker processing capacity reaches bottleneck
- Consumer flow control: Because consumer capacity reaches bottleneck
Producer Flow Control
- When commitLog file is locked longer than osPageCacheBusyTimesOutMills, flow control occurs
- Broker implements flow control by refusing send requests
Consumer Flow Control
- When consumer local cached messages exceed pullThresholdForQueue, default 1000
- When consumer local cached message size exceeds pullThresholdSizeForQueue, default 100MB
Dead Letter Queue
Dead letter queue is an important mechanism in message queue system for handling messages that cannot be normally consumed.
- When consumer fails to process a message for the first time, message queue automatically marks that message as “retry state”
- Message queue will perform multiple retries based on configured retry strategy
- When still fails after reaching preset maximum retry count, system determines that message as “dead letter message”
- In RocketMQ dead letter queue naming rule:
%DLQ%+ consumer group name
Error Quick Reference
| Symptom | Root Cause | Diagnosis | Fix |
|---|---|---|---|
| Order messages not consumed in order | Same business key not routed to same queue/ordered consumption not enabled | Check producer routing, consumer ordered mode | Fixed route to same queue, use ordered consumption |
| Tag filtering not taking effect | Subscription expression/Tag writing error | Compare Producer Tag with subscription expression | Standardize Tag, unify Topic semantics |
| Crazy retry after consumption failure | Unrecoverable failure type but using fast retry | Check failure stack type, retry interval | Unrecoverable errors directly discard |
| Duplicate consumption causing data chaos | Producer redelivery/network jitter causing duplicate delivery | Count same business key duplicate times | Business side idempotency |
| DLQ backlog growing rapidly | Max retry count exhausted still failing | Check DLQ message volume, failure type | Establish DLQ processing job |
| Transaction message stuck long | Local transaction not correctly committed/rolled back | Check transaction listener return value | Guarantee local transaction and state determinable |