API Throttling for Web Applications
Throttling — managing request processing speed at the server level, unlike rate limiting which restricts the client. Throttling slows down or queues incoming requests to protect the backend from overload. The difference is fundamental: rate limit says "you made too many requests", throttling says "we process as much as we can".
Throttling vs Rate Limiting
| Aspect | Rate Limiting | Throttling |
|---|---|---|
| Subject | Client (IP, user_id) | Server (CPU, queue) |
| Action on limit | 429, request denied | Request delayed or queued |
| Purpose | Protection from abuse | Backend resource protection |
| Response to client | Immediate 429 | Delay or 503 |
In practice, both mechanisms are applied together.
Throttling Heavy Operations
Some operations — report export, file processing, email distribution — should not run in unlimited parallel:
// BullMQ — throttle via concurrency + rateLimit
const queue = new Queue('reports', { connection: redis });
const worker = new Worker('reports', processReport, {
connection: redis,
concurrency: 5, // maximum 5 parallel tasks
limiter: {
max: 10, // 10 tasks
duration: 60_000, // per 60 seconds
},
});
// Add task with priority
await queue.add('generate-csv', { userId, filters }, {
priority: user.plan === 'enterprise' ? 1 : 10,
attempts: 3,
backoff: { type: 'exponential', delay: 2000 },
});
Adaptive Throttling
Adaptive throttling reduces limits when latency or errors increase:
class AdaptiveThrottler {
private limit = 100;
private readonly minLimit = 10;
private readonly maxLimit = 100;
async check(): Promise<boolean> {
const metrics = await this.getMetrics();
// Lower limit when p95 latency is high
if (metrics.p95Latency > 500) {
this.limit = Math.max(this.minLimit, this.limit * 0.8);
} else if (metrics.p95Latency < 200 && metrics.errorRate < 0.01) {
this.limit = Math.min(this.maxLimit, this.limit * 1.1);
}
return this.counter.increment() <= this.limit;
}
}
Google uses similar mechanism in their services ("Client-Side Throttling" from SRE book).
Circuit Breaker
Throttling for outgoing requests to external APIs — Circuit Breaker pattern:
import CircuitBreaker from 'opossum';
const options = {
timeout: 3000, // request > 3 sec = fail
errorThresholdPercentage: 50, // 50% errors → open
resetTimeout: 30000, // retry after 30 sec (half-open)
volumeThreshold: 10, // minimum 10 requests to calculate
};
const breaker = new CircuitBreaker(callExternalAPI, options);
breaker.on('open', () => logger.warn('Circuit breaker OPEN — external API unavailable'));
breaker.on('halfOpen', () => logger.info('Circuit breaker HALF-OPEN — testing'));
breaker.on('close', () => logger.info('Circuit breaker CLOSE — external API recovered'));
// Fallback when circuit open
breaker.fallback(() => ({ status: 'cached', data: getCachedData() }));
States: Closed (normal) → Open (too many errors, requests not sent) → Half-Open (trial request) → Closed (if successful).
Throttling Incoming Webhooks
Partners can send thousands of webhooks simultaneously (e.g., on bulk status updates). Correct pattern — accept quickly (202), queue:
// WebhookController.php — immediate response
public function handle(Request $request)
{
$payload = $request->all();
$signature = $request->header('X-Signature');
if (!$this->verifySignature($payload, $signature)) {
return response()->json(['error' => 'Invalid signature'], 401);
}
// Put in queue with throttle
ProcessWebhook::dispatch($payload)
->onQueue('webhooks')
->delay(now()); // immediately, but via queue
return response()->json(['accepted' => true], 202);
}
// config/queue.php — worker limit for webhooks queue
// Horizon:
'webhooks' => [
'connection' => 'redis',
'queue' => ['webhooks'],
'balance' => 'auto',
'maxProcesses' => 10, // no more than 10 parallel
],
Throttling in Nginx Upstream
upstream backend {
server app1:3000;
server app2:3000;
# Limit concurrent connections to upstream
keepalive 32;
}
# queue — buffering on overload
location /api/ {
proxy_pass http://backend;
proxy_connect_timeout 1s;
proxy_read_timeout 30s;
# If backend slow to respond — 503 instead of hanging
proxy_next_upstream error timeout http_503;
proxy_next_upstream_tries 2;
}
Monitoring Throttling
Metrics for dashboard:
// Prometheus counters
throttleRejected.inc({ reason: 'queue_full', endpoint: '/api/export' });
throttleDelayed.observe({ endpoint: '/api/export' }, delayMs);
queueDepth.set({ queue: 'reports' }, await queue.count());
Alert: queue depth > 1000 for 5 minutes → Scale up workers or notify on-call.
Timeline
BullMQ with concurrency + rateLimit, circuit breaker for external APIs, webhook queue: 3–5 days. With adaptive throttling, Prometheus metrics, Grafana dashboard and alerts: 1–2 weeks.







