Setting Up Dead Letter Queue for Error Handling
When a consumer cannot process a message — it should not simply disappear. Dead Letter Queue (DLQ) is a queue where messages that failed to be delivered are automatically placed: rejected messages, expired by TTL, or exceeding delivery limit.
Without DLQ, lost messages are untrackable. With DLQ — there is a way to analyze the error, fix it, and return the message for reprocessing.
DLQ Mechanism in RabbitMQ
A message is moved to Dead Letter Exchange under three conditions:
- Consumer called
basic.nackorbasic.rejectwithrequeue=false - Message TTL expired (
x-message-ttlon queue orexpirationin properties) - Queue is full (
x-max-lengthorx-max-length-bytes)
# 1. Create Dead Letter Exchange
rabbitmqadmin declare exchange \
name=dlx \
type=direct \
durable=true
# 2. Create DLQ
rabbitmqadmin declare queue \
name=order-processing-dlq \
durable=true \
arguments='{"x-queue-type":"quorum","x-message-ttl":2592000000}'
# 30 days retention for analysis
# 3. Bind DLQ to DLX
rabbitmqadmin declare binding \
source=dlx \
destination=order-processing-dlq \
routing_key=order-processing.failed
# 4. Main queue with DLX reference
rabbitmqadmin declare queue \
name=order-processing \
durable=true \
arguments='{
"x-queue-type": "quorum",
"x-dead-letter-exchange": "dlx",
"x-dead-letter-routing-key": "order-processing.failed",
"x-delivery-limit": 3
}'
# x-delivery-limit: after 3 attempts — to DLQ (only for quorum queues)
Retry with Exponential Backoff
Simple nack immediately returns message to queue — worker takes it again and fails again. Correct approach — delayed retry through chain of queues.
# Queue with 1 minute delay
rabbitmqadmin declare queue \
name=order-processing-retry-1m \
durable=true \
arguments='{
"x-message-ttl": 60000,
"x-dead-letter-exchange": "",
"x-dead-letter-routing-key": "order-processing",
"x-queue-type": "classic"
}'
# Message expires after 1 minute → automatically goes to main queue
# Queue with 10 minute delay
rabbitmqadmin declare queue \
name=order-processing-retry-10m \
durable=true \
arguments='{
"x-message-ttl": 600000,
"x-dead-letter-exchange": "",
"x-dead-letter-routing-key": "order-processing",
"x-queue-type": "classic"
}'
# Queue with 1 hour delay
rabbitmqadmin declare queue \
name=order-processing-retry-1h \
durable=true \
arguments='{
"x-message-ttl": 3600000,
"x-dead-letter-exchange": "",
"x-dead-letter-routing-key": "order-processing",
"x-queue-type": "classic"
}'
Consumer logic:
function handleMessage(AMQPMessage $message): void
{
$headers = $message->get('application_headers');
$retryCount = $headers ? (int)($headers->getNativeData()['x-retry-count'] ?? 0) : 0;
try {
processOrder(json_decode($message->body, true));
$message->ack();
} catch (TemporaryException $e) {
// Temporary error — retryable
$retryCount++;
if ($retryCount >= 3) {
// Exhausted attempts — to DLQ
$message->nack(false);
return;
}
// Send to retry queue with delay
$retryQueue = match($retryCount) {
1 => 'order-processing-retry-1m',
2 => 'order-processing-retry-10m',
default => 'order-processing-retry-1h',
};
$retryMessage = new AMQPMessage(
$message->body,
[
'delivery_mode' => AMQPMessage::DELIVERY_MODE_PERSISTENT,
'headers' => new AMQPTable(array_merge(
$headers ? $headers->getNativeData() : [],
[
'x-retry-count' => $retryCount,
'x-original-queue' => 'order-processing',
'x-last-error' => $e->getMessage(),
'x-retry-at' => date('Y-m-d H:i:s'),
]
)),
]
);
$channel->basic_publish($retryMessage, '', $retryQueue);
$message->ack(); // ack original to prevent duplicates
} catch (PermanentException $e) {
// Permanent error — directly to DLQ
$message->nack(false);
Log::error('Permanent failure, message sent to DLQ', [
'order_id' => $payload['order_id'],
'error' => $e->getMessage(),
]);
}
}
Kafka DLQ
Kafka has no built-in DLQ mechanism — it is implemented in consumer code:
@Component
public class OrderEventConsumer {
private final KafkaTemplate<String, String> kafkaTemplate;
private static final String DLQ_TOPIC = "order-events-dlq";
@KafkaListener(topics = "order-events", groupId = "order-processor")
public void consume(ConsumerRecord<String, String> record, Acknowledgment ack) {
try {
processOrder(record.value());
ack.acknowledge();
} catch (RetriableException e) {
// Spring Kafka automatically retries with backoff
throw e; // don't ack — SeekToCurrentErrorHandler takes control
} catch (Exception e) {
// Non-retriable — send to DLQ
sendToDlq(record, e);
ack.acknowledge(); // ack original to prevent getting stuck
}
}
private void sendToDlq(ConsumerRecord<String, String> original, Exception error) {
Headers headers = new RecordHeaders(original.headers().toArray());
headers.add("x-original-topic", original.topic().getBytes());
headers.add("x-original-partition", String.valueOf(original.partition()).getBytes());
headers.add("x-original-offset", String.valueOf(original.offset()).getBytes());
headers.add("x-error-message", error.getMessage().getBytes());
headers.add("x-failed-at", Instant.now().toString().getBytes());
ProducerRecord<String, String> dlqRecord = new ProducerRecord<>(
DLQ_TOPIC,
null,
original.key(),
original.value(),
headers
);
kafkaTemplate.send(dlqRecord);
log.error("Sent to DLQ: topic={} partition={} offset={} error={}",
original.topic(), original.partition(), original.offset(), error.getMessage());
}
}
Spring Kafka configuration with automatic retry:
@Bean
public DefaultErrorHandler errorHandler(KafkaOperations<?, ?> template) {
// Exponential backoff: 1s, 2s, 4s, 8s, 16s
ExponentialBackOffWithMaxRetries backOff = new ExponentialBackOffWithMaxRetries(5);
backOff.setInitialInterval(1000L);
backOff.setMultiplier(2.0);
backOff.setMaxInterval(16000L);
DeadLetterPublishingRecoverer recoverer = new DeadLetterPublishingRecoverer(template,
(record, ex) -> new TopicPartition(record.topic() + "-dlq", record.partition() % 3)
);
DefaultErrorHandler handler = new DefaultErrorHandler(recoverer, backOff);
handler.addNotRetryableExceptions(
JsonProcessingException.class,
IllegalArgumentException.class
);
return handler;
}
DLQ Monitoring
Alert on DLQ growth — first sign of system problem:
# Prometheus alert rule
- alert: DLQGrowth
expr: rabbitmq_queue_messages{queue=~".*dlq.*"} > 100
for: 5m
labels:
severity: warning
annotations:
summary: "DLQ {{ $labels.queue }} has {{ $value }} messages"
- alert: DLQSpike
expr: rate(rabbitmq_queue_messages_published_total{queue=~".*dlq.*"}[5m]) > 10
for: 2m
labels:
severity: critical
Tool for Reprocessing Messages from DLQ
#!/bin/bash
# reprocess-dlq.sh — move messages from DLQ back to main queue
DLQ="order-processing-dlq"
TARGET_QUEUE="order-processing"
BATCH=100
for i in $(seq 1 $BATCH); do
MESSAGE=$(rabbitmqadmin get queue=$DLQ ackmode=ack_requeue_false count=1 2>/dev/null)
if [ -z "$MESSAGE" ]; then
echo "DLQ is empty"
break
fi
# Publish back to main queue
BODY=$(echo "$MESSAGE" | python3 -c "import sys,json; msgs=json.load(sys.stdin); print(msgs[0]['payload'] if msgs else '')" 2>/dev/null)
rabbitmqadmin publish exchange='' routing_key="$TARGET_QUEUE" payload="$BODY"
done
Timeline
Day 1 — design DLQ schema: for each working queue — DLX, DLQ, retry queues with TTL. Creation via CLI or Management API.
Day 2 — integrate retry logic into consumers, save error metadata in headers (original topic, timestamp, error message).
Day 3 — alerts on DLQ growth, tool for reprocessing, document DLQ handling procedure for support team.







