Webhook system with delivery guarantee retry and backoff

Our company is engaged in the development, support and maintenance of sites of any complexity. From simple one-page sites to large-scale cluster systems built on micro services. Experience of developers is confirmed by certificates from vendors.
Development and maintenance of all types of websites:
Informational websites or web applications
Business card websites, landing pages, corporate websites, online catalogs, quizzes, promo websites, blogs, news resources, informational portals, forums, aggregators
E-commerce websites or web applications
Online stores, B2B portals, marketplaces, online exchanges, cashback websites, exchanges, dropshipping platforms, product parsers
Business process management web applications
CRM systems, ERP systems, corporate portals, production management systems, information parsers
Electronic service websites or web applications
Classified ads platforms, online schools, online cinemas, website builders, portals for electronic services, video hosting platforms, thematic portals

These are just some of the technical types of websites we work with, and each of them can have its own specific features and functionality, as well as be customized to meet the specific needs and goals of the client.

Our competencies:
Development stages
Latest works
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822
  • image_crm_chasseurs_493_0.webp
    CRM development for Chasseurs
    847
  • image_website-sbh_0.png
    Website development for SBH Partners
    999
  • image_website-_0.png
    Website development for Red Pear
    451

Setting Up Webhook System with Delivery Guarantee (Retry/Backoff)

A webhook without retry is just an HTTP request you send and forget. Real systems fail: the recipient's endpoint is unavailable, timeouts occur, 500 errors happen. Delivery guarantee means the event will reach the recipient even if they were unavailable for several hours.

Principles of Reliable Delivery

At-least-once delivery: a webhook may be delivered more than once. The recipient must be idempotent—reprocessing one event shouldn't duplicate its effect.

Queue as a buffer: webhook sending doesn't happen directly from the event handler. The event is written to a queue, a worker reads and sends it. If sending fails, the event returns to the queue.

Exponential backoff: the interval between attempts increases exponentially so you don't attack an already overloaded recipient.

Data Schema

CREATE TABLE webhook_subscriptions (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  consumer_id UUID NOT NULL REFERENCES consumers(id),
  endpoint_url TEXT NOT NULL,
  secret TEXT NOT NULL,
  events TEXT[] NOT NULL,          -- ['order.created', 'order.paid']
  is_active BOOLEAN DEFAULT true,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE webhook_deliveries (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  subscription_id UUID NOT NULL REFERENCES webhook_subscriptions(id),
  event_type TEXT NOT NULL,
  payload JSONB NOT NULL,
  attempt_count INTEGER DEFAULT 0,
  max_attempts INTEGER DEFAULT 8,
  status TEXT DEFAULT 'pending',   -- pending | delivered | failed | cancelled
  next_attempt_at TIMESTAMPTZ DEFAULT NOW(),
  last_response_code INTEGER,
  last_response_body TEXT,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  delivered_at TIMESTAMPTZ
);

CREATE INDEX idx_deliveries_pending ON webhook_deliveries(next_attempt_at)
  WHERE status = 'pending';

Retry Algorithm with Backoff

Exponential backoff with jitter prevents synchronized retry storms—situations where all workers simultaneously hammer one endpoint:

import random
import math

def next_attempt_delay(attempt: int, base_delay: float = 30.0) -> float:
    """
    attempt 1: ~30s
    attempt 2: ~60s
    attempt 3: ~120s
    attempt 4: ~240s
    attempt 5: ~480s  (~8 min)
    attempt 6: ~960s  (~16 min)
    attempt 7: ~1920s (~32 min)
    attempt 8: ~3840s (~64 min) — final attempt
    """
    exponential = base_delay * (2 ** attempt)
    # Full jitter: random value in range [0, exponential]
    jitter = random.uniform(0, exponential)
    # Caps at 1 hour
    return min(jitter, 3600)

PHP/Laravel worker implementation:

class ProcessWebhookDelivery implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable;

    public int $tries = 1; // Retry logic is ours, not Laravel's

    public function handle(WebhookDelivery $delivery): void
    {
        $subscription = $delivery->subscription;
        $payload = json_encode($delivery->payload);
        $signature = hash_hmac('sha256', $payload, $subscription->secret);

        try {
            $response = Http::timeout(10)
                ->withHeaders([
                    'Content-Type'       => 'application/json',
                    'X-Webhook-ID'       => $delivery->id,
                    'X-Webhook-Event'    => $delivery->event_type,
                    'X-Webhook-Timestamp'=> now()->timestamp,
                    'X-Webhook-Signature'=> 'sha256=' . $signature,
                ])
                ->post($subscription->endpoint_url, $delivery->payload);

            if ($response->successful()) {
                $delivery->update([
                    'status'            => 'delivered',
                    'last_response_code'=> $response->status(),
                    'delivered_at'      => now(),
                ]);
                return;
            }

            $this->scheduleRetry($delivery, $response->status(), $response->body());

        } catch (ConnectionException | TimeoutException $e) {
            $this->scheduleRetry($delivery, null, $e->getMessage());
        }
    }

    private function scheduleRetry(WebhookDelivery $delivery, ?int $code, string $body): void
    {
        $delivery->increment('attempt_count');
        $delivery->update([
            'last_response_code' => $code,
            'last_response_body' => substr($body, 0, 1000),
        ]);

        if ($delivery->attempt_count >= $delivery->max_attempts) {
            $delivery->update(['status' => 'failed']);
            // Notify subscription owner
            event(new WebhookDeliveryFailed($delivery));
            return;
        }

        $delay = $this->calculateDelay($delivery->attempt_count);
        $delivery->update(['next_attempt_at' => now()->addSeconds($delay)]);

        // Requeue
        static::dispatch($delivery)->delay(now()->addSeconds($delay));
    }

    private function calculateDelay(int $attempt): int
    {
        $base = 30 * (2 ** $attempt);
        return min((int)($base * random_int(50, 150) / 100), 3600);
    }
}

Idempotency on the Recipient Side

The webhook recipient must handle retries. Minimal protection:

# Django example
from django.db import IntegrityError

def handle_webhook(request):
    webhook_id = request.headers.get('X-Webhook-ID')

    try:
        # Unique key on webhook_id — duplicate insert will fail
        ProcessedWebhook.objects.create(webhook_id=webhook_id)
    except IntegrityError:
        # Already processed — return 200, do nothing
        return JsonResponse({'status': 'already_processed'})

    # Event processing
    process_event(request.json())
    return JsonResponse({'status': 'ok'})

Signature Verification

Without verification, anyone can send a fake webhook:

public function verifySignature(Request $request): bool
{
    $signature = $request->header('X-Webhook-Signature');
    $payload   = $request->getContent();
    $secret    = config('webhooks.secret');

    $expected = 'sha256=' . hash_hmac('sha256', $payload, $secret);

    // Use hash_equals to protect against timing attacks
    return hash_equals($expected, $signature ?? '');
}

Monitoring

Key metrics:

  • Delivery rate: percentage of successful deliveries / total attempts
  • p95 delivery time: time from event creation to delivery
  • Failed deliveries: count of finally failed deliveries—require manual attention
  • Queue depth: if growing, need more workers

Timelines

Basic system with retry/backoff: 3–5 days. With monitoring, delivery dashboard, and failure notifications: 1–1.5 weeks.