Multi-region failover for global web application

Our company is engaged in the development, support and maintenance of sites of any complexity. From simple one-page sites to large-scale cluster systems built on micro services. Experience of developers is confirmed by certificates from vendors.
Development and maintenance of all types of websites:
Informational websites or web applications
Business card websites, landing pages, corporate websites, online catalogs, quizzes, promo websites, blogs, news resources, informational portals, forums, aggregators
E-commerce websites or web applications
Online stores, B2B portals, marketplaces, online exchanges, cashback websites, exchanges, dropshipping platforms, product parsers
Business process management web applications
CRM systems, ERP systems, corporate portals, production management systems, information parsers
Electronic service websites or web applications
Classified ads platforms, online schools, online cinemas, website builders, portals for electronic services, video hosting platforms, thematic portals

These are just some of the technical types of websites we work with, and each of them can have its own specific features and functionality, as well as be customized to meet the specific needs and goals of the client.

Our competencies:
Development stages
Latest works
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822
  • image_crm_chasseurs_493_0.webp
    CRM development for Chasseurs
    847
  • image_website-sbh_0.png
    Website development for SBH Partners
    999
  • image_website-_0.png
    Website development for Red Pear
    451

Setting Up Multi-Region Failover for Global Web Applications

Multi-region failover protects against entire region disasters: AWS us-east-1 data center outage, undersea cable break, IP blocking in specific countries. This is the next level beyond single server failover — more complex, costlier, but necessary for applications with users worldwide or strict availability requirements.

Deployment Strategies

Active-Passive. Primary region serves all traffic. Standby region is hot, data replicates, but doesn't accept traffic. On primary failure — Route 53 / Cloudflare switches DNS.

Pros: simpler management, cheaper (standby region can run reduced capacity). Cons: RTO 1-5 minutes, users in standby region experience higher latency.

Active-Active. Both (or all) regions serve traffic simultaneously. GeoDNS routes users to nearest region. If one fails — traffic redistributes.

Pros: better global latency, RTO near zero for unaffected region users. Cons: complex data sync between regions, conflicts in distributed database.

DNS Routing with Geolocation

AWS Route 53 Latency-Based Routing + Health Checks:

Route 53 → Latency policy
  us-east-1: ALB endpoint + Health check
  eu-west-1: ALB endpoint + Health check
  ap-southeast-1: ALB endpoint + Health check

If region health check fails →
  traffic automatically shifts to remaining regions

Cloudflare Load Balancing with Traffic Steering: Geo Steering or Dynamic Steering (based on actual RTT). Failure detection in 10-60 seconds, switching in seconds.

Data Replication Between Regions

Main multi-region problem — data. User writes data in us-east-1, failover goes to eu-west-1 — data missing.

For PostgreSQL: AWS Aurora Global Database — replication lag < 1 second, promotion of standby region in ~1 minute. Or CockroachDB / Spanner as natively geo-distributed database.

For stateless data: S3 Cross-Region Replication — files replicate automatically. CloudFront with multiple origins.

For sessions: Redis with cross-region replication (AWS ElastiCache Global Datastore) or JWT tokens (stateless by nature).

For queues: AWS SQS doesn't replicate cross-region automatically — need design with regional isolation or Kafka with MirrorMaker 2.

Testing: Chaos Engineering at Regional Level

Verify multi-region failover without actual region failure:

  1. Traffic blocking at ALB level — target group gets 0 healthy instances
  2. AWS Fault Injection Simulator — simulate delays and failures of region components
  3. Route 53 Health Check → forced failure — manually set health check to unhealthy via API

Record: failure detection time (must be < 60s), DNS switch time (TTL-dependent, usually 60-120s), active user behavior (sessions lost, in-flight data lost).

Configuration Management

Each region must be identically configured. Infrastructure as Code — mandatory:

  • Terraform with workspace per region or separate state files
  • Same Docker images (ECR replication or private registry per region)
  • Secrets Manager replication (AWS Secrets Manager multi-region)

Config drift between regions is the main reason failover works in tests but breaks in production.

Cost and Trade-offs

Active-passive: +40-60% to single region infrastructure cost. Active-active: +80-120% (full copy of each region + cross-region traffic).

For most projects — active-passive with hot standby is sufficient. Active-active needed for: > 100k RPS, global audience with latency requirements, 99.99%+ SLA.

Implementation Timeline

  • Active-passive (2 regions, DNS failover) — 1-2 weeks
  • Aurora Global Database + application — 2-3 weeks
  • Active-active with data sync — 4-8 weeks
  • Complete testing + runbook + monitoring — +1 week