Mobile App Backend Load Balancing Setup
Load balancing is far more than distributing requests across two servers. Improper session configuration, sticky session handling, or health check setup leads users to receive 401 Unauthorized after successful login—because their request was routed to a different instance.
Where Load Balancing Failures Occur
When a mobile client logs in, it receives a JWT token. The next request goes to a different pod—if tokens are stored in memory rather than Redis, the user is logged out. This is a real scenario with stateful sessions lacking centralized storage.
Another critical issue: WebSocket connections. Long-lived connections for chat or live tracking must always reach the same pod. If the load balancer drops WebSocket connections during a new pod deployment, all active connections fail simultaneously.
Configuration for Mobile Traffic
L7 load balancing (HTTP/HTTPS). Sufficient for most REST APIs. Use Nginx, HAProxy, AWS ALB, or Google Cloud Load Balancing. Algorithm: Round Robin for stateless services, Least Connections for heavy requests (file uploads, complex aggregations).
Sticky sessions—avoid them. Binding a user to a pod via SERVERID cookie or IP hash loses horizontal scalability. If the pod fails, the user's session is lost. Better approach: stateless service + JWT + Redis for shared state.
WebSocket. For Nginx: proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade";. AWS ALB supports WebSocket natively. Set WebSocket timeout explicitly (proxy_read_timeout 3600s)—otherwise Nginx closes idle connections after 60 seconds.
Health checks. Not GET /—that might return an HTML "service running" page without verifying database connectivity. Use a dedicated endpoint /health/ready that verifies database, Redis, and external dependencies. The load balancer removes a pod from rotation after two consecutive failures, restores it after two successes.
Kubernetes Ingress as Load Balancer
In Kubernetes, load balancing occurs at the Service level (kube-proxy, iptables / IPVS) plus an Ingress controller for external traffic. Ingress-NGINX is the de facto standard: supports WebSocket, rate limiting via nginx.ingress.kubernetes.io/limit-rps annotation, and upstream hashing for specific endpoints.
IPVS mode instead of iptables in kube-proxy: with 1000+ services, iptables rules become linear in processing time; IPVS is O(1). Enable via --proxy-mode=ipvs in kube-proxy ConfigMap.
Real-world case: a delivery mobile app with peak load of 8000 rps at lunchtime. Single backend instance hits 80% CPU at peak. After adding load balancing across 3 pods via AWS ALB with /api/health/ready checking PostgreSQL connectivity, the first deployment without this setup caused 20 seconds of downtime (old pod killed, new pod not yet ready). After configuring minReadySeconds: 30 and rolling update strategy with maxUnavailable: 0, subsequent 50+ deployments achieved zero downtime.
Timeline: basic Nginx/HAProxy load balancing setup with health checks—1–2 days. Full Kubernetes Ingress configuration with mTLS, rate limiting, and zero-downtime deploys—1–2 weeks.







