Setting Up Distributed Background Jobs (Multiple Workers)
One worker — one point of failure and limited throughput. Multiple workers on multiple servers — this is horizontal scaling for processing and resilience to individual node failures. Implementation requires a centralized broker, proper configuration, and understanding of problems arising from parallel processing.
Architecture
[App Server 1] [App Server 2] [App Server 3]
↓ ↓ ↓
dispatch dispatch dispatch
↓ ↓ ↓
┌─────────────────────────────┐
│ Redis / RabbitMQ │ ← centralized broker
└─────────────────────────────┘
↓ ↓ ↓
[Worker 1] [Worker 2] [Worker 3] ← can be on different servers
The broker is the only component that must be accessible to all servers. Other nodes don't communicate directly.
Broker Requirements
Redis — standard choice for Laravel. Requires phpredis or predis. For high availability — Redis Sentinel or Redis Cluster.
RabbitMQ — suits complex routing scenarios (fanout, topic exchanges). Laravel supports via vladimir-yuldashev/laravel-queue-rabbitmq package.
Amazon SQS — managed service, no maintenance needed. Suitable for AWS infrastructure.
Minimal Redis production configuration — separate server (not shared with main DB), persistence enabled (appendonly yes), maxmemory-policy configured.
Laravel Configuration for Distributed Workers
// config/queue.php
'connections' => [
'redis' => [
'driver' => 'redis',
'connection' => 'queue', // separate Redis connection for queues
'queue' => env('REDIS_QUEUE', 'default'),
'retry_after' => 90, // seconds before stuck task retries
'block_for' => 5, // blocking BLPOP instead of polling
'after_commit' => true, // dispatch only after DB commit
],
],
retry_after is key parameter for distributed workers: if a worker crashes during task execution, the task will be visible to other workers again after retry_after seconds. Should be greater than Job's timeout.
Horizontal Scaling via Horizon
Horizon supports running on multiple servers. Each server runs its own Horizon instance, they don't coordinate directly — Redis acts as the common registry.
Same Supervisor config runs on each server:
[program:horizon]
command=php /var/www/artisan horizon
autostart=true
autorestart=true
user=www-data
stdout_logfile=/var/log/horizon.log
stopwaitsecs=3600
Horizon automatically balances workers within one server. For inter-server balancing — manual process count tuning based on each server's capacity.
Concurrent Access and Deduplication
With multiple workers, one task can be taken twice if a worker hangs and doesn't release lock. Redis LPOP mechanism is atomic — task is taken by one worker. But "invisible" tasks (taken but not completed) return to queue after retry_after.
If a task must execute strictly once (idempotency) — check this explicitly:
class ProcessPaymentJob implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
public function __construct(private string $paymentId) {}
public function handle(): void
{
// Distributed lock via Redis — only one worker processes payment
$lock = Cache::lock("payment:{$this->paymentId}", 120);
if (!$lock->get()) {
// Another worker is already processing
$this->release(10); // return to queue after 10 seconds
return;
}
try {
$payment = Payment::find($this->paymentId);
// Idempotency check
if ($payment?->status !== 'pending') {
return; // already processed
}
$this->processPayment($payment);
} finally {
$lock->release();
}
}
}
Cache::lock() uses Redis SET NX PX — atomic operation guaranteeing exactly one worker gets the lock.
Worker Pool Separation by Load Type
Different servers can run workers for different queues if tasks require specific resources:
[Server: API-1, API-2] → workers for 'critical', 'default'
[Server: Media-1] → workers for 'transcoding', 'media'
[Server: Worker-1] → workers for 'batch', 'reports', 'low'
Media server has GPU or powerful CPU for FFmpeg; API servers have fast workers with small timeout.
Supervisor on Media server:
[program:media-worker]
command=php /var/www/artisan queue:work --queue=transcoding,media --timeout=3600 --max-jobs=1
numprocs=2
autostart=true
autorestart=true
user=www-data
--max-jobs=1 — worker takes one task and restarts (frees memory after heavy operation).
Graceful Shutdown
On deploy you need to wait for current tasks to finish, not kill workers abruptly:
php artisan queue:restart
This command sets a flag in Redis — workers finish current task and stop. Supervisor will restart them with new code.
In Supervisor stopwaitsecs should be at least the maximum Job timeout:
stopwaitsecs=3600 # for transcoding server
stopwaitsecs=120 # for standard workers
Monitoring Distributed State
Horizon aggregates metrics from all servers in one dashboard. Key indicators:
- Throughput (tasks/minute) per queue
- Wait time — average task wait in queue
- Runtime — average execution time
- Failed jobs — count of failed tasks
Automatic worker scaling (if infrastructure on Kubernetes):
# HPA for scaling worker pods by queue depth metric
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: queue-workers
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: queue-worker
minReplicas: 2
maxReplicas: 20
metrics:
- type: External
external:
metric:
name: redis_queue_depth
selector:
matchLabels:
queue: default
target:
type: AverageValue
averageValue: "50" # scale if > 50 tasks per worker
Custom metric redis_queue_depth exported via Prometheus Redis Exporter.
RabbitMQ as Alternative
When complex routing is needed (different event types → different queues, fanout broadcast), RabbitMQ provides more flexibility:
// config/queue.php
'rabbitmq' => [
'driver' => 'rabbitmq',
'dsn' => env('RABBITMQ_DSN', 'amqp://user:pass@localhost:5672/'),
'queue' => env('RABBITMQ_QUEUE', 'default'),
'options' => [
'exchange' => [
'name' => 'app-exchange',
'type' => 'direct',
],
'queue' => [
'durable' => true,
'exclusive' => false,
'auto_delete' => false,
],
],
],
RabbitMQ Management UI (port 15672) provides detailed monitoring: consumers, connections, channel load, message rates.
Timeline
Setting up Redis Sentinel/Cluster or RabbitMQ, Horizon configuration on multiple servers, Supervisor — 1 working day. Distributed locks, idempotency checks in critical Jobs — 6–8 hours. Kubernetes HPA and Prometheus integration — separate project for 1–2 days.







