Implementing Response Caching at API Gateway Level
Gateway-level caching works before requests reach the backend. Properly configured caching removes 30–70% load from services without code changes. Misconfigured caching serves other people's data or keeps stale responses for days.
What to Cache and What Not
Safe to cache:
-
GETrequests with public data (catalog, references, articles) - Responses with explicit
Cache-Control: public, max-age=Nfrom upstream - Endpoints that don't depend on session ID
Don't cache:
- Any
POST,PUT,PATCH,DELETE - Requests with
Authorization: Bearer ...header — unless user-level isolation is configured - Responses with 4xx/5xx status (except 404 for public resources if intentional)
- Data with personal information
Kong Gateway: proxy-cache Plugin
plugins:
- name: proxy-cache
config:
response_code: [200, 301, 404]
request_method: [GET, HEAD]
content_type:
- application/json
- application/json; charset=utf-8
cache_ttl: 300
strategy: memory
memory:
dictionary_name: kong_db_cache
For production — use redis strategy instead of memory:
strategy: redis
redis:
host: redis.internal
port: 6379
timeout: 2000
database: 0
password: ${REDIS_PASSWORD}
Kong adds X-Cache-Status header to response: Hit, Miss, Bypass, Refresh. Useful for debugging.
Cache invalidation via Admin API:
# Delete specific key
curl -X DELETE http://kong-admin:8001/proxy-cache/caches/{cache_key}
# Clear all cache
curl -X DELETE http://kong-admin:8001/proxy-cache/
AWS API Gateway + ElastiCache
AWS doesn't have native cache for HTTP APIs, but it does for REST APIs:
{
"cacheClusterEnabled": true,
"cacheClusterSize": "0.5",
"methodSettings": {
"GET /products": {
"cachingEnabled": true,
"cacheTtlInSeconds": 300,
"cacheDataEncrypted": false,
"requireAuthorizationForCacheControl": false
}
}
}
Terraform:
resource "aws_api_gateway_stage" "main" {
deployment_id = aws_api_gateway_deployment.main.id
rest_api_id = aws_api_gateway_rest_api.main.id
stage_name = "v1"
cache_cluster_enabled = true
cache_cluster_size = "0.5"
method_settings {
resource_path = "/products"
http_method = "GET"
settings {
caching_enabled = true
cache_ttl_in_seconds = 300
}
}
}
Invalidation via request with Cache-Control: max-age=0 header — if client has execute-api:InvalidateCache right.
Nginx: proxy_cache
If gateway is on Nginx:
proxy_cache_path /var/cache/nginx/api
levels=1:2
keys_zone=api_cache:10m
max_size=1g
inactive=10m
use_temp_path=off;
server {
location /api/v1/catalog/ {
proxy_cache api_cache;
proxy_cache_valid 200 5m;
proxy_cache_valid 404 1m;
proxy_cache_use_stale error timeout updating http_500 http_502 http_503;
proxy_cache_background_update on;
proxy_cache_lock on;
# Cache key — without auth params
proxy_cache_key "$scheme$request_method$host$uri$is_args$args";
# Don't cache if client sends session cookie
proxy_cache_bypass $cookie_session_id;
proxy_no_cache $cookie_session_id;
add_header X-Cache-Status $upstream_cache_status;
proxy_pass http://backend;
}
}
proxy_cache_use_stale updating + proxy_cache_background_update on — stale-while-revalidate pattern: user gets old response instantly, update happens in background. Critical for heavy endpoints.
Vary and Cache Spaces
If API returns different responses based on headers, configure cache key properly. Typical example — multilingual API:
# Split cache by language
proxy_cache_key "$scheme$request_method$host$uri$is_args$args$http_accept_language";
In Kong via vary_headers:
config:
vary_headers:
- Accept-Language
- Accept-Encoding
Without this, the first user with Accept-Language: en "claims" the cache, and everyone else gets English responses.
Cache Stampede
When a popular key's TTL expires, multiple workers hit upstream simultaneously — "cache storm". Protection:
-
Mutex/lock: only one worker updates, others wait (
proxy_cache_lock onin Nginx) - Probabilistic early expiration: refresh cache slightly before TTL with increasing probability as expiration approaches
- Stale responses: serve stale response while update is in progress
Timeline
Basic cache setup for 2–3 routes: 1 day. Full strategy with invalidation, Vary, hit rate monitoring, and stale-while-revalidate: 3–5 days.







