Serverless Warming Implementation for Latency Reduction
Cold start—main problem in serverless functions for latency-sensitive apps. First invoke after idle period takes 200ms-2s depending on runtime, package size, and config. For API handling real user requests, unacceptable. Warming solves problem by keeping functions "warm".
Cold Start Nature
What happens at cold start:
- Cloud finds available container/VM
- Loads function image
- Initializes runtime (Node.js, Python, JVM)
- Executes initialization code (outside handler)
- Executes handler
Steps 1-4 are overhead. Steps 1-3 provider controls, steps 4-5 developer controls.
Typical cold start times:
- Python 3.12 (AWS Lambda, 256MB)—200-400ms
- Node.js 20—100-300ms
- Java 17—800ms-2s (JVM startup)
- Go—50-150ms
Scheduled Warming
Simplest approach: invoke function every 5 minutes via CloudWatch Events / EventBridge to keep it warm.
# lambda_warmer.py—ping function
import json
def handler(event, context):
if event.get('source') == 'warming':
# This is ping from warmers, not real request
return {'statusCode': 200, 'body': json.dumps({'warm': True})}
# Real function logic
return process_request(event)
# Terraform: CloudWatch rule for warming
resource "aws_cloudwatch_event_rule" "warmer" {
name = "lambda-warmer"
schedule_expression = "rate(5 minutes)"
}
resource "aws_cloudwatch_event_target" "warmer" {
rule = aws_cloudwatch_event_rule.warmer.name
arn = aws_lambda_function.api.arn
input = jsonencode({"source": "warming"})
}
Limitation: each EventBridge trigger spawns only one concurrent instance. For multiple desired warm instances need N parallel invokes.
Warming Multiple Parallel Instances
import boto3
import asyncio
lambda_client = boto3.client('lambda')
async def warm_instance(function_name: str, instance_num: int):
lambda_client.invoke(
FunctionName=function_name,
InvocationType='RequestResponse',
Payload=json.dumps({
'source': 'warming',
'instance': instance_num,
'sleep': 10 # Keep instance busy 10 seconds
})
)
async def warm_function(function_name: str, concurrent_count: int = 5):
"""Spawn N parallel warmup invokes"""
tasks = [warm_instance(function_name, i) for i in range(concurrent_count)]
await asyncio.gather(*tasks)
While one invoke keeps instance busy (sleep 10s), Lambda creates new container for next parallel invoke. Result: 5 warm instances.
AWS Lambda Provisioned Concurrency
Official AWS solution: reserve initialized instances. Costs extra but guarantees P99 latency without cold start.
resource "aws_lambda_provisioned_concurrency_config" "api" {
function_name = aws_lambda_function.api.function_name
qualifier = aws_lambda_alias.live.name
provisioned_concurrent_executions = 5
}
Auto Scaling Provisioned Concurrency—scale provisioning by schedule (more morning, less night):
resource "aws_appautoscaling_target" "lambda_pc" {
max_capacity = 20
min_capacity = 2
resource_id = "function:${aws_lambda_function.api.function_name}:live"
scalable_dimension = "lambda:function:ProvisionedConcurrency"
service_namespace = "lambda"
}
resource "aws_appautoscaling_policy" "lambda_pc_tracking" {
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.lambda_pc.resource_id
scalable_dimension = aws_appautoscaling_target.lambda_pc.scalable_dimension
service_namespace = aws_appautoscaling_target.lambda_pc.service_namespace
target_tracking_scaling_policy_configuration {
target_value = 0.7 # 70% utilization of provisioning
predefined_metric_specification {
predefined_metric_type = "LambdaProvisionedConcurrencyUtilization"
}
}
}
Optimization of Initialization Code
Warming helps but reducing cold start itself—better strategy:
# BAD: create clients inside handler
def handler(event, context):
dynamodb = boto3.resource('dynamodb') # Every cold start
db_client = psycopg2.connect(DSN) # Creates connection
...
# GOOD: create clients at module level (once)
import boto3
import psycopg2
dynamodb = boto3.resource('dynamodb') # Init at cold start
_connection = None # Lazy connection pool
def get_connection():
global _connection
if _connection is None or _connection.closed:
_connection = psycopg2.connect(DSN)
return _connection
def handler(event, context):
conn = get_connection() # Reuses existing connection
...
Lambda SnapStart (Java)
AWS Lambda SnapStart for Java: creates snapshot of initialized function state. Java cold start reduced from 1-2s to 100-200ms.
resource "aws_lambda_function" "java_api" {
...
snap_start {
apply_on = "PublishedVersions"
}
}
Timeline
- Scheduled warming (EventBridge)—0.5 day
- Parallel warming script—1 day
- Provisioned Concurrency + Auto Scaling—1-2 days
- Initialization code optimization—1-3 days (depends on codebase)







