Implementing Rate Limiting at the API Gateway Level
Rate Limiting protects backend services from overload and prevents API abuse. At the Gateway level, this works more efficiently than in each service separately — a single point of policy application for all traffic.
Rate Limiting Algorithms
Token Bucket — a bucket with tokens replenished at a fixed rate. Allows short-term bursts. Used in most API Gateways.
Leaky Bucket — traffic leaks at a constant rate, bursts are smoothed out. Predictable load on upstream.
Fixed Window — counts requests in a fixed time window. Problem: double load at window boundaries.
Sliding Window — sliding window. More accurate than Fixed Window, no boundary effect.
Kong: Multi-Level Rate Limiting
# Global limit (all services)
curl -X POST http://localhost:8001/plugins \
-d "name=rate-limiting" \
-d "config.minute=1000" \
-d "config.hour=20000" \
-d "config.policy=redis" \
-d "config.redis_host=redis" \
-d "config.limit_by=ip"
# Service-level limit
curl -X POST http://localhost:8001/services/payments-api/plugins \
-d "name=rate-limiting" \
-d "config.second=10" \
-d "config.minute=200" \
-d "config.limit_by=consumer"
# Limit for specific consumer
curl -X POST http://localhost:8001/consumers/free-tier/plugins \
-d "name=rate-limiting" \
-d "config.minute=60" \
-d "config.hour=500"
Response headers (client sees limits):
X-RateLimit-Limit-Minute: 60
X-RateLimit-Remaining-Minute: 43
RateLimit-Reset: 37
APISIX: Distributed Rate Limiting
{
"plugins": {
"limit-count": {
"count": 100,
"time_window": 60,
"rejected_code": 429,
"rejected_msg": "Too many requests",
"key": "consumer_name",
"policy": "redis",
"redis_host": "redis",
"redis_port": 6379,
"redis_database": 0,
"show_limit_quota_header": true
},
"limit-req": {
"rate": 10,
"burst": 5,
"key": "remote_addr",
"rejected_code": 429
},
"limit-conn": {
"conn": 50,
"burst": 10,
"key": "remote_addr",
"rejected_code": 503
}
}
}
limit-req implements Leaky Bucket (requests exceeding the rate go into burst queue, then 429).
limit-conn limits the number of concurrent connections.
AWS API Gateway: Usage Plans
resource "aws_api_gateway_usage_plan" "tiers" {
for_each = {
free = { rate = 10, burst = 5, quota = 1000, period = "DAY" }
basic = { rate = 50, burst = 25, quota = 10000, period = "DAY" }
pro = { rate = 200, burst = 100, quota = 100000, period = "DAY" }
enterprise = { rate = 1000, burst = 500, quota = 0, period = "DAY" } # unlimited quota
}
name = "plan-${each.key}"
api_stages {
api_id = aws_api_gateway_rest_api.main.id
stage = "prod"
}
throttle_settings {
rate_limit = each.value.rate
burst_limit = each.value.burst
}
dynamic "quota_settings" {
for_each = each.value.quota > 0 ? [1] : []
content {
limit = each.value.quota
period = each.value.period
}
}
}
Dynamic Rate Limiting by Business Attributes
Rate limit does not always depend only on IP or API key. Often you need logic: user subscription, resource type, time of day.
-- Kong custom plugin
local function get_rate_limit(consumer_id)
local cache_key = "rate:" .. consumer_id
local cached = kong.cache:get(cache_key)
if cached then return cached end
-- Request to billing service
local client = httpc.new()
local res = client:request_uri("http://billing-service/limits/" .. consumer_id)
local limits = cjson.decode(res.body)
kong.cache:set(cache_key, limits, 300) -- cache 5 minutes
return limits
end
local limits = get_rate_limit(consumer_id)
-- limits = { minute: 1000, hour: 10000 }
Bypassing Rate Limiting with Legitimate Load
Whitelist for internal services and monitoring:
# Kong: disable rate limiting for specific IPs
curl -X POST http://localhost:8001/plugins \
-d "name=ip-restriction" \
-d "config.allow[]=10.0.0.0/8" \
-d "config.status=200"
# Or through consumer without rate limiting plugin
Response When Limit Exceeded
Standard HTTP 429 Too Many Requests with informative body:
{
"error": "rate_limit_exceeded",
"message": "You have exceeded the 60 requests/minute limit",
"retry_after": 23,
"limit": 60,
"window": "minute",
"upgrade_url": "https://company.com/pricing"
}
Implementation Timeline
Setup of multi-level rate limiting (by IP, consumer, service) with Redis — 1–2 business days.







