Disaster recovery site DR Site setup for web application

Our company is engaged in the development, support and maintenance of sites of any complexity. From simple one-page sites to large-scale cluster systems built on micro services. Experience of developers is confirmed by certificates from vendors.
Development and maintenance of all types of websites:
Informational websites or web applications
Business card websites, landing pages, corporate websites, online catalogs, quizzes, promo websites, blogs, news resources, informational portals, forums, aggregators
E-commerce websites or web applications
Online stores, B2B portals, marketplaces, online exchanges, cashback websites, exchanges, dropshipping platforms, product parsers
Business process management web applications
CRM systems, ERP systems, corporate portals, production management systems, information parsers
Electronic service websites or web applications
Classified ads platforms, online schools, online cinemas, website builders, portals for electronic services, video hosting platforms, thematic portals

These are just some of the technical types of websites we work with, and each of them can have its own specific features and functionality, as well as be customized to meet the specific needs and goals of the client.

Our competencies:
Development stages
Latest works
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822
  • image_crm_chasseurs_493_0.webp
    CRM development for Chasseurs
    847
  • image_website-sbh_0.png
    Website development for SBH Partners
    999
  • image_website-_0.png
    Website development for Red Pear
    451

Setting Up Disaster Recovery (DR) Site for Web Application

DR Site is a complete copy of infrastructure in separate physical or cloud data center, ready to take traffic on catastrophic primary site failure. The term "disaster recovery site" covers different readiness levels: from cold standby (hours to bring up) to hot standby (minutes).

DR Site Classification by Readiness

Cold Standby. Infrastructure not running. Data replicates, configuration stored in IaC. On failure: spin up environment from Terraform → restore data from backup → start application. RTO: 2-8 hours.

Warm Standby. Basic infrastructure runs at reduced size (1 instance instead of 10). Data current via replication. On failure: scale to production size → switch DNS. RTO: 15-60 minutes.

Hot Standby. Full infrastructure copy runs continuously. Data synchronized with lag < 1 minute. On failure: switch DNS/balancer. RTO: 1-5 minutes.

Selecting DR Site Location

Key requirements:

  • Physically independent power grid and internet channels
  • Minimum 100 km from primary site (protection from regional disasters)
  • Legal compliance (user data from RF — in RF, GDPR for Europe)

Options:

  • Second AWS/GCP/Azure region (simplest)
  • Different cloud provider (protection from vendor outage)
  • Own or leased co-location (for regulated industries)

Data Replication

PostgreSQL → DR Site: Streaming replication with async standby in DR. For critical data — synchronous_commit = remote_apply (guarantees data on standby if primary fails, but increases write latency).

Monitor replication lag:

SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;

Alert if lag > 30 seconds.

File Storage:

  • S3 Cross-Region Replication (AWS) — automatic, RPO < 15 minutes
  • Rclone sync on schedule — for infrequently changing objects
  • Lsyncd for realtime filesystem sync between servers

Redis: Redis Sentinel with replica in DR or Redis Cluster with geo-distribution.

Infrastructure as Code for DR

All DR Site described in Terraform. Primary and standby environments — different workspaces or separate config directories, parameterized via variables:

module "app_cluster" {
  source        = "./modules/app"
  region        = var.region
  instance_type = var.dr_mode ? "t3.medium" : "c6i.2xlarge"
  replica_count = var.dr_mode ? 1 : 5
}

Cold standby: terraform apply only on DR activation. Warm standby: terraform apply immediately with dr_mode = true.

DR Site Activation Procedure

Documented runbook with exact commands — not general words, but specific steps:

  1. Confirm primary site failure (not false alarm)
  2. Declare DR incident, assign incident manager
  3. Check DB replication lag before switching
  4. If warm/hot: promote DB replica (pg_promote())
  5. Update DNS (Route 53 / Cloudflare) to DR addresses
  6. Verify functionality via DR Site
  7. Notify team and, if needed, users
  8. Record RTO

Network Connectivity

Dedicated channel needed between primary site and DR Site for data replication:

  • AWS VPC Peering or Transit Gateway (within AWS)
  • AWS Direct Connect / GCP Interconnect (on-premise to cloud)
  • Site-to-site VPN (budget option, less reliable)

Replication channel must be isolated from user traffic — application peak load shouldn't affect replication.

DR Site Cost

Type Permanent Cost Example (AWS)
Cold Standby Storage + replication $50-200/month
Warm Standby ~30% of prod $500-2000/month
Hot Standby ~80-100% of prod $2000-8000/month

For most web applications, warm standby is optimal: reasonable cost with RTO 30-60 minutes.

Implementation Timeline

  • Analyze current infrastructure and choose strategy — 2-3 days
  • Configure data replication — 3-7 days
  • Deploy DR infrastructure in IaC — 5-10 days
  • Network connectivity and security — 2-5 days
  • Procedures, runbook, testing — 3-5 days

Total: 2-5 weeks depending on infrastructure complexity and DR type.