Stress testing to determine website load limits

Our company is engaged in the development, support and maintenance of sites of any complexity. From simple one-page sites to large-scale cluster systems built on micro services. Experience of developers is confirmed by certificates from vendors.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Development and maintenance of all types of websites:

Informational websites or web applications

Business card websites, landing pages, corporate websites, online catalogs, quizzes, promo websites, blogs, news resources, informational portals, forums, aggregators

E-commerce websites or web applications

Online stores, B2B portals, marketplaces, online exchanges, cashback websites, exchanges, dropshipping platforms, product parsers

Business process management web applications

CRM systems, ERP systems, corporate portals, production management systems, information parsers

Electronic service websites or web applications

Classified ads platforms, online schools, online cinemas, website builders, portals for electronic services, video hosting platforms, thematic portals

These are just some of the technical types of websites we work with, and each of them can have its own specific features and functionality, as well as be customized to meet the specific needs and goals of the client.

Offered services

Showing 1 of 1 servicesAll 2065 services

Stress testing to determine website load limits

Complex

~3-5 business days

FAQ

Our competencies:

Free consultation

Book a free consultation if you have any questions. A dedicated specialist will advise you.

Cost calculation

If you know what exactly you need to develop, or you already have a ready-made technical task.

Development stages

Latest works

Development of a web application for FEEDME
1170
Development of an online store for the company FURNORO
1094
Development of a web application for Enviok
830
CRM development for Chasseurs
879
Website development for SBH Partners
999
Website development for Red Pear
453

Show more works

Stress Testing: Determining Load Limits

Stress test intentionally overloads the system beyond normal operation to find the breaking point. Answers questions: at what RPS do errors start rising? How does the system recover after overload? What's the bottleneck—DB, CPU, memory, network?

Methodology

Step 1: Define baseline. Run normal load (50–70% of expected peak) and record metrics: p95 latency, error rate, CPU/memory.

Step 2: Stepwise increase. Raise load in steps of 10–20% every 2–5 minutes. Record point where errors or latency start rising.

Step 3: Find breaking point. Continue until degradation (error rate > 5% or latency > 5x baseline).

Step 4: Recovery. Remove load and observe how quickly the system returns to normal.

k6 Stress Test Scenario

// tests/stress/breaking-point.js
import http from 'k6/http'
import { check, sleep } from 'k6'
import { Rate, Trend, Counter } from 'k6/metrics'

const errorRate = new Rate('errors')
const requestsPerSecond = new Counter('requests_per_second')

export const options = {
  stages: [
    // Warmup to normal traffic
    { duration: '2m',  target: 50 },
    { duration: '3m',  target: 50 },   // baseline level

    // Stepwise increase
    { duration: '2m',  target: 100 },
    { duration: '3m',  target: 100 },

    { duration: '2m',  target: 200 },
    { duration: '3m',  target: 200 },

    { duration: '2m',  target: 400 },
    { duration: '3m',  target: 400 },

    { duration: '2m',  target: 800 },
    { duration: '3m',  target: 800 },

    { duration: '2m',  target: 1600 },
    { duration: '3m',  target: 1600 },

    // Cooldown and observe recovery
    { duration: '5m',  target: 50 },
    { duration: '3m',  target: 0 },
  ],

  // Don't abort on threshold breach—need to see full picture
  thresholds: {
    http_req_duration: [
      { threshold: 'p(95)<2000', abortOnFail: false },
    ],
    errors: [
      { threshold: 'rate<0.1', abortOnFail: false }
    ]
  }
}

const BASE_URL = __ENV.BASE_URL || 'http://localhost:3000'

export default function() {
  const responses = http.batch([
    ['GET', `${BASE_URL}/api/products?limit=20`],
    ['GET', `${BASE_URL}/api/categories`],
  ])

  responses.forEach(r => {
    check(r, { 'status 2xx': (r) => r.status >= 200 && r.status < 300 })
    errorRate.add(r.status >= 400)
  })

  requestsPerSecond.add(2)
  sleep(0.1)
}

export function handleSummary(data) {
  // Find degradation point from collected data
  const stages = analyzeStages(data)
  return {
    'stress-results.json': JSON.stringify(data, null, 2),
    stdout: generateReport(stages)
  }
}

function generateReport(stages) {
  return `
=== STRESS TEST REPORT ===
Breaking Point Analysis:
${stages.map(s => `  VUs: ${s.vus} | p95: ${s.p95}ms | Errors: ${(s.errorRate*100).toFixed(1)}%`).join('\n')}
`
}

Monitoring During Test

Run system metrics collection in parallel:

#!/bin/bash
# scripts/monitor-stress-test.sh

TARGET_HOST="app-server-ip"
INTERVAL=10  # seconds

while true; do
  TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)

  # CPU, Memory, Load Average
  ssh $TARGET_HOST "
    echo -n '$TIMESTAMP '
    echo -n 'cpu:'; top -bn1 | grep 'Cpu(s)' | awk '{print \$2}'; echo -n ' '
    echo -n 'mem:'; free | grep Mem | awk '{print \$3/\$2 * 100}'; echo -n ' '
    echo -n 'load:'; cat /proc/loadavg | awk '{print \$1}'
    echo -n 'conns:'; ss -s | grep -o 'estab [0-9]*' | awk '{print \$2}'
  "

  # PostgreSQL: active queries and locks
  ssh $TARGET_HOST "
    PGPASSWORD=pass psql -U app -d appdb -t -c \"
      SELECT 'active_queries:', count(*) FROM pg_stat_activity
        WHERE state = 'active' AND query NOT LIKE '%pg_stat%';
      SELECT 'long_queries:', count(*) FROM pg_stat_activity
        WHERE state = 'active' AND query_start < NOW() - interval '5 seconds';
      SELECT 'locks:', count(*) FROM pg_locks WHERE NOT granted;
    \"
  "

  sleep $INTERVAL
done | tee stress-monitor.log

Analyzing Results with Prometheus + Grafana

# k6 with Prometheus Remote Write
k6 run \
  -o experimental-prometheus-rw \
  --env K6_PROMETHEUS_RW_SERVER_URL=http://prometheus:9090/api/v1/write \
  --env K6_PROMETHEUS_RW_TREND_AS_NATIVE_HISTOGRAM=true \
  tests/stress/breaking-point.js

# Grafana queries for stress test analysis

# RPS in real time
rate(k6_http_reqs_total[30s])

# Error rate over time (find degradation moment)
rate(k6_http_req_failed_total[30s]) / rate(k6_http_reqs_total[30s])

# p95 latency in real time
histogram_quantile(0.95, rate(k6_http_req_duration_seconds_bucket[30s]))

# Correlation: load vs latency vs errors

Identifying Bottleneck

# analyze_stress_results.py
import json
import pandas as pd

def analyze_breaking_point(results_file):
    with open(results_file) as f:
        data = json.load(f)

    # Extract time series
    metrics = data['metrics']

    analysis = {
        'max_rps_before_errors': find_max_sustainable_rps(metrics),
        'error_threshold_rps': find_error_threshold(metrics),
        'latency_degradation_point': find_latency_degradation(metrics),
        'recovery_time_seconds': find_recovery_time(metrics),
    }

    print("=== Breaking Point Analysis ===")
    print(f"Max sustainable RPS (< 1% errors): {analysis['max_rps_before_errors']}")
    print(f"Error threshold RPS: {analysis['error_threshold_rps']}")
    print(f"p95 > 1s at RPS: {analysis['latency_degradation_point']}")
    print(f"Recovery time after load removal: {analysis['recovery_time_seconds']}s")

    # Recommendations
    if analysis['max_rps_before_errors'] < 100:
        print("\n[!] LOW capacity. Consider: DB connection pooling, caching, horizontal scaling")
    elif analysis['recovery_time_seconds'] > 120:
        print("\n[!] SLOW recovery. Consider: circuit breakers, graceful degradation")

    return analysis

Typical Bottlenecks and Diagnostics

Symptom	Probable Cause	Diagnostics
Latency grows, CPU low	DB locks or slow queries	`pg_stat_activity`, slow query log
CPU 100%, few errors	Computational bottleneck	`top`, application profiler
`ENOMEM` errors	Memory leak or OOM	`free -m`, `/proc/meminfo`
Connection refused	Connection pool exhausted	pgBouncer stats, netstat
502 Bad Gateway	Worker processes overloaded	Nginx error log, worker_processes

Timeline

Stress test with stepwise load profile, monitoring, and breaking point analysis—2–3 business days.