Setting Up Webhook System with Delivery Guarantee (Retry/Backoff)
A webhook without retry is just an HTTP request you send and forget. Real systems fail: the recipient's endpoint is unavailable, timeouts occur, 500 errors happen. Delivery guarantee means the event will reach the recipient even if they were unavailable for several hours.
Principles of Reliable Delivery
At-least-once delivery: a webhook may be delivered more than once. The recipient must be idempotent—reprocessing one event shouldn't duplicate its effect.
Queue as a buffer: webhook sending doesn't happen directly from the event handler. The event is written to a queue, a worker reads and sends it. If sending fails, the event returns to the queue.
Exponential backoff: the interval between attempts increases exponentially so you don't attack an already overloaded recipient.
Data Schema
CREATE TABLE webhook_subscriptions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
consumer_id UUID NOT NULL REFERENCES consumers(id),
endpoint_url TEXT NOT NULL,
secret TEXT NOT NULL,
events TEXT[] NOT NULL, -- ['order.created', 'order.paid']
is_active BOOLEAN DEFAULT true,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE webhook_deliveries (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
subscription_id UUID NOT NULL REFERENCES webhook_subscriptions(id),
event_type TEXT NOT NULL,
payload JSONB NOT NULL,
attempt_count INTEGER DEFAULT 0,
max_attempts INTEGER DEFAULT 8,
status TEXT DEFAULT 'pending', -- pending | delivered | failed | cancelled
next_attempt_at TIMESTAMPTZ DEFAULT NOW(),
last_response_code INTEGER,
last_response_body TEXT,
created_at TIMESTAMPTZ DEFAULT NOW(),
delivered_at TIMESTAMPTZ
);
CREATE INDEX idx_deliveries_pending ON webhook_deliveries(next_attempt_at)
WHERE status = 'pending';
Retry Algorithm with Backoff
Exponential backoff with jitter prevents synchronized retry storms—situations where all workers simultaneously hammer one endpoint:
import random
import math
def next_attempt_delay(attempt: int, base_delay: float = 30.0) -> float:
"""
attempt 1: ~30s
attempt 2: ~60s
attempt 3: ~120s
attempt 4: ~240s
attempt 5: ~480s (~8 min)
attempt 6: ~960s (~16 min)
attempt 7: ~1920s (~32 min)
attempt 8: ~3840s (~64 min) — final attempt
"""
exponential = base_delay * (2 ** attempt)
# Full jitter: random value in range [0, exponential]
jitter = random.uniform(0, exponential)
# Caps at 1 hour
return min(jitter, 3600)
PHP/Laravel worker implementation:
class ProcessWebhookDelivery implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable;
public int $tries = 1; // Retry logic is ours, not Laravel's
public function handle(WebhookDelivery $delivery): void
{
$subscription = $delivery->subscription;
$payload = json_encode($delivery->payload);
$signature = hash_hmac('sha256', $payload, $subscription->secret);
try {
$response = Http::timeout(10)
->withHeaders([
'Content-Type' => 'application/json',
'X-Webhook-ID' => $delivery->id,
'X-Webhook-Event' => $delivery->event_type,
'X-Webhook-Timestamp'=> now()->timestamp,
'X-Webhook-Signature'=> 'sha256=' . $signature,
])
->post($subscription->endpoint_url, $delivery->payload);
if ($response->successful()) {
$delivery->update([
'status' => 'delivered',
'last_response_code'=> $response->status(),
'delivered_at' => now(),
]);
return;
}
$this->scheduleRetry($delivery, $response->status(), $response->body());
} catch (ConnectionException | TimeoutException $e) {
$this->scheduleRetry($delivery, null, $e->getMessage());
}
}
private function scheduleRetry(WebhookDelivery $delivery, ?int $code, string $body): void
{
$delivery->increment('attempt_count');
$delivery->update([
'last_response_code' => $code,
'last_response_body' => substr($body, 0, 1000),
]);
if ($delivery->attempt_count >= $delivery->max_attempts) {
$delivery->update(['status' => 'failed']);
// Notify subscription owner
event(new WebhookDeliveryFailed($delivery));
return;
}
$delay = $this->calculateDelay($delivery->attempt_count);
$delivery->update(['next_attempt_at' => now()->addSeconds($delay)]);
// Requeue
static::dispatch($delivery)->delay(now()->addSeconds($delay));
}
private function calculateDelay(int $attempt): int
{
$base = 30 * (2 ** $attempt);
return min((int)($base * random_int(50, 150) / 100), 3600);
}
}
Idempotency on the Recipient Side
The webhook recipient must handle retries. Minimal protection:
# Django example
from django.db import IntegrityError
def handle_webhook(request):
webhook_id = request.headers.get('X-Webhook-ID')
try:
# Unique key on webhook_id — duplicate insert will fail
ProcessedWebhook.objects.create(webhook_id=webhook_id)
except IntegrityError:
# Already processed — return 200, do nothing
return JsonResponse({'status': 'already_processed'})
# Event processing
process_event(request.json())
return JsonResponse({'status': 'ok'})
Signature Verification
Without verification, anyone can send a fake webhook:
public function verifySignature(Request $request): bool
{
$signature = $request->header('X-Webhook-Signature');
$payload = $request->getContent();
$secret = config('webhooks.secret');
$expected = 'sha256=' . hash_hmac('sha256', $payload, $secret);
// Use hash_equals to protect against timing attacks
return hash_equals($expected, $signature ?? '');
}
Monitoring
Key metrics:
- Delivery rate: percentage of successful deliveries / total attempts
- p95 delivery time: time from event creation to delivery
- Failed deliveries: count of finally failed deliveries—require manual attention
- Queue depth: if growing, need more workers
Timelines
Basic system with retry/backoff: 3–5 days. With monitoring, delivery dashboard, and failure notifications: 1–1.5 weeks.







