Implementing Serverless File Processing (Lambda + S3 Trigger)
Lambda + S3 trigger is a classic serverless pattern for file processing. A file uploads to S3, an event automatically triggers Lambda, which processes the file and saves results. No constantly running server, automatic scaling.
Typical Scenarios
- Generate thumbnail on image upload
- Convert video to different formats and resolutions
- Process CSV/Excel files, import data to DB
- PDF generation from templates
- Antivirus scanning of uploaded files
- OCR and text extraction from documents
- Data transformation (XML → JSON, normalization)
Basic Architecture
[User] → S3 upload → [S3 Event Notification]
↓
[Lambda Function]
↓
[Processed file → S3 Output]
[Metadata → DynamoDB]
[Notification → SQS/SNS]
# S3 bucket for incoming files
resource "aws_s3_bucket" "uploads" {
bucket = "myapp-uploads"
}
# S3 bucket for processed files
resource "aws_s3_bucket" "processed" {
bucket = "myapp-processed"
}
# Lambda notification from S3
resource "aws_s3_bucket_notification" "upload_trigger" {
bucket = aws_s3_bucket.uploads.id
lambda_function {
lambda_function_arn = aws_lambda_function.processor.arn
events = ["s3:ObjectCreated:*"]
filter_prefix = "images/" # Only files in this folder
filter_suffix = ".jpg" # Only JPG files
}
}
Lambda Handler
import boto3
import json
import os
from urllib.parse import unquote_plus
from PIL import Image
import io
s3 = boto3.client('s3')
def handler(event, context):
results = []
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = unquote_plus(record['s3']['object']['key'])
try:
result = process_image(bucket, key)
results.append({'key': key, 'status': 'success', **result})
except Exception as e:
print(f"Error processing {key}: {e}")
results.append({'key': key, 'status': 'error', 'error': str(e)})
return results
def process_image(bucket: str, key: str) -> dict:
# Download original
obj = s3.get_object(Bucket=bucket, Key=key)
image_data = obj['Body'].read()
image = Image.open(io.BytesIO(image_data))
thumbnails = {}
for size_name, (width, height) in [('sm', (150, 150)), ('md', (400, 400)), ('lg', (800, 800))]:
thumb = image.copy()
thumb.thumbnail((width, height), Image.LANCZOS)
buffer = io.BytesIO()
thumb.save(buffer, format=image.format or 'JPEG', quality=85)
buffer.seek(0)
output_key = key.replace('images/', f'thumbnails/{size_name}/')
s3.put_object(
Bucket=os.environ['OUTPUT_BUCKET'],
Key=output_key,
Body=buffer,
ContentType=f'image/{(image.format or "JPEG").lower()}'
)
thumbnails[size_name] = output_key
return {'thumbnails': thumbnails, 'original_size': image.size}
Processing Large Files
Lambda has limits: /tmp up to 10GB, timeout up to 15 minutes, memory up to 10GB. For files >100MB — streaming processing:
import boto3
import csv
import io
def process_large_csv(bucket: str, key: str):
s3 = boto3.client('s3')
# StreamingBody — read in parts without loading to memory
obj = s3.get_object(Bucket=bucket, Key=key)
batch = []
batch_size = 1000
for line in obj['Body'].iter_lines():
row = line.decode('utf-8')
batch.append(parse_csv_row(row))
if len(batch) >= batch_size:
save_batch_to_db(batch)
batch = []
if batch:
save_batch_to_db(batch)
For video transcoding — use AWS MediaConvert or Elastic Transcoder instead of Lambda (not time-limited).
Error Handling and DLQ
S3 event notifications don't support DLQ directly. Reliable scheme:
S3 → SNS Topic → SQS Queue → Lambda
↓ (after maxReceiveCount)
SQS DLQ
resource "aws_s3_bucket_notification" "upload_trigger" {
bucket = aws_s3_bucket.uploads.id
topic {
topic_arn = aws_sns_topic.file_events.arn
events = ["s3:ObjectCreated:*"]
}
}
resource "aws_sns_topic_subscription" "to_sqs" {
topic_arn = aws_sns_topic.file_events.arn
protocol = "sqs"
endpoint = aws_sqs_queue.file_processing.arn
}
Lambda Configuration for File Processing
resource "aws_lambda_function" "processor" {
filename = "processor.zip"
function_name = "file-processor"
role = aws_iam_role.processor.arn
handler = "handler.handler"
runtime = "python3.12"
timeout = 300 # 5 minutes
memory_size = 1024 # 1GB for image processing
ephemeral_storage {
size = 2048 # 2GB /tmp for temp files
}
environment {
variables = {
OUTPUT_BUCKET = aws_s3_bucket.processed.bucket
}
}
}
Monitoring and Metrics
- Number of processed files per hour
- Average processing time by file type
- Error rate + DLQ contents
- Lambda duration distribution (outliers = problematic files)
CloudWatch Dashboard with these metrics + alert on DLQ growth.
Implementation Timeline
- Basic S3 trigger + Lambda handler — 1-2 days
- Reliable scheme (SNS + SQS + DLQ) — 1-2 days
- Processing specific file types — 2-5 days
- Monitoring + testing — 1-2 days







