Response caching at API Gateway level

Our company is engaged in the development, support and maintenance of sites of any complexity. From simple one-page sites to large-scale cluster systems built on micro services. Experience of developers is confirmed by certificates from vendors.
Development and maintenance of all types of websites:
Informational websites or web applications
Business card websites, landing pages, corporate websites, online catalogs, quizzes, promo websites, blogs, news resources, informational portals, forums, aggregators
E-commerce websites or web applications
Online stores, B2B portals, marketplaces, online exchanges, cashback websites, exchanges, dropshipping platforms, product parsers
Business process management web applications
CRM systems, ERP systems, corporate portals, production management systems, information parsers
Electronic service websites or web applications
Classified ads platforms, online schools, online cinemas, website builders, portals for electronic services, video hosting platforms, thematic portals

These are just some of the technical types of websites we work with, and each of them can have its own specific features and functionality, as well as be customized to meet the specific needs and goals of the client.

Our competencies:
Development stages
Latest works
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822
  • image_crm_chasseurs_493_0.webp
    CRM development for Chasseurs
    847
  • image_website-sbh_0.png
    Website development for SBH Partners
    999
  • image_website-_0.png
    Website development for Red Pear
    451

Implementing Response Caching at API Gateway Level

Gateway-level caching works before requests reach the backend. Properly configured caching removes 30–70% load from services without code changes. Misconfigured caching serves other people's data or keeps stale responses for days.

What to Cache and What Not

Safe to cache:

  • GET requests with public data (catalog, references, articles)
  • Responses with explicit Cache-Control: public, max-age=N from upstream
  • Endpoints that don't depend on session ID

Don't cache:

  • Any POST, PUT, PATCH, DELETE
  • Requests with Authorization: Bearer ... header — unless user-level isolation is configured
  • Responses with 4xx/5xx status (except 404 for public resources if intentional)
  • Data with personal information

Kong Gateway: proxy-cache Plugin

plugins:
  - name: proxy-cache
    config:
      response_code: [200, 301, 404]
      request_method: [GET, HEAD]
      content_type:
        - application/json
        - application/json; charset=utf-8
      cache_ttl: 300
      strategy: memory
      memory:
        dictionary_name: kong_db_cache

For production — use redis strategy instead of memory:

      strategy: redis
      redis:
        host: redis.internal
        port: 6379
        timeout: 2000
        database: 0
        password: ${REDIS_PASSWORD}

Kong adds X-Cache-Status header to response: Hit, Miss, Bypass, Refresh. Useful for debugging.

Cache invalidation via Admin API:

# Delete specific key
curl -X DELETE http://kong-admin:8001/proxy-cache/caches/{cache_key}

# Clear all cache
curl -X DELETE http://kong-admin:8001/proxy-cache/

AWS API Gateway + ElastiCache

AWS doesn't have native cache for HTTP APIs, but it does for REST APIs:

{
  "cacheClusterEnabled": true,
  "cacheClusterSize": "0.5",
  "methodSettings": {
    "GET /products": {
      "cachingEnabled": true,
      "cacheTtlInSeconds": 300,
      "cacheDataEncrypted": false,
      "requireAuthorizationForCacheControl": false
    }
  }
}

Terraform:

resource "aws_api_gateway_stage" "main" {
  deployment_id        = aws_api_gateway_deployment.main.id
  rest_api_id          = aws_api_gateway_rest_api.main.id
  stage_name           = "v1"
  cache_cluster_enabled = true
  cache_cluster_size    = "0.5"

  method_settings {
    resource_path = "/products"
    http_method   = "GET"

    settings {
      caching_enabled      = true
      cache_ttl_in_seconds = 300
    }
  }
}

Invalidation via request with Cache-Control: max-age=0 header — if client has execute-api:InvalidateCache right.

Nginx: proxy_cache

If gateway is on Nginx:

proxy_cache_path /var/cache/nginx/api
  levels=1:2
  keys_zone=api_cache:10m
  max_size=1g
  inactive=10m
  use_temp_path=off;

server {
  location /api/v1/catalog/ {
    proxy_cache api_cache;
    proxy_cache_valid 200 5m;
    proxy_cache_valid 404 1m;
    proxy_cache_use_stale error timeout updating http_500 http_502 http_503;
    proxy_cache_background_update on;
    proxy_cache_lock on;

    # Cache key — without auth params
    proxy_cache_key "$scheme$request_method$host$uri$is_args$args";

    # Don't cache if client sends session cookie
    proxy_cache_bypass $cookie_session_id;
    proxy_no_cache $cookie_session_id;

    add_header X-Cache-Status $upstream_cache_status;
    proxy_pass http://backend;
  }
}

proxy_cache_use_stale updating + proxy_cache_background_update on — stale-while-revalidate pattern: user gets old response instantly, update happens in background. Critical for heavy endpoints.

Vary and Cache Spaces

If API returns different responses based on headers, configure cache key properly. Typical example — multilingual API:

# Split cache by language
proxy_cache_key "$scheme$request_method$host$uri$is_args$args$http_accept_language";

In Kong via vary_headers:

    config:
      vary_headers:
        - Accept-Language
        - Accept-Encoding

Without this, the first user with Accept-Language: en "claims" the cache, and everyone else gets English responses.

Cache Stampede

When a popular key's TTL expires, multiple workers hit upstream simultaneously — "cache storm". Protection:

  • Mutex/lock: only one worker updates, others wait (proxy_cache_lock on in Nginx)
  • Probabilistic early expiration: refresh cache slightly before TTL with increasing probability as expiration approaches
  • Stale responses: serve stale response while update is in progress

Timeline

Basic cache setup for 2–3 routes: 1 day. Full strategy with invalidation, Vary, hit rate monitoring, and stale-while-revalidate: 3–5 days.