Implementing Proxy Rotation for Web Scraping
Proxy rotation is a mandatory component of industrial web scraping. A single IP address will inevitably be blocked when making requests at high frequency. The rotation system's task is to automatically distribute requests across a pool of addresses and remove blocked proxies from rotation.
Rotator Architecture
Request → Proxy Selector
↓
[Proxy Pool]
192.168.1.1:8080 ← status: OK, 245 requests
10.0.0.5:3128 ← status: OK, 198 requests
172.16.0.2:8080 ← status: BLOCKED, in quarantine
↓
Target Site
↓
Response Checker
(status 200 → success / 403/429 → fail)
Python Implementation
import asyncio
import aiohttp
from dataclasses import dataclass, field
from datetime import datetime, timedelta
@dataclass
class ProxyEntry:
url: str
success: int = 0
fail: int = 0
blocked_until: datetime = field(default_factory=lambda: datetime.min)
@property
def is_available(self):
return datetime.now() > self.blocked_until
@property
def success_rate(self):
total = self.success + self.fail
return self.success / total if total > 0 else 0.5
class ProxyRotator:
def __init__(self, proxies: list[str], quarantine_minutes=15):
self.pool = [ProxyEntry(url=p) for p in proxies]
self.quarantine = timedelta(minutes=quarantine_minutes)
self._lock = asyncio.Lock()
async def get(self) -> ProxyEntry:
async with self._lock:
available = [p for p in self.pool if p.is_available]
if not available:
raise RuntimeError("All proxies are blocked")
# weighted selection by success rate
weights = [p.success_rate for p in available]
return random.choices(available, weights=weights)[0]
async def report(self, proxy: ProxyEntry, success: bool):
async with self._lock:
if success:
proxy.success += 1
else:
proxy.fail += 1
proxy.blocked_until = datetime.now() + self.quarantine
Integration with aiohttp
async def fetch(session, url, rotator):
proxy = await rotator.get()
try:
async with session.get(url, proxy=proxy.url, timeout=10) as resp:
if resp.status in (403, 429, 503):
await rotator.report(proxy, success=False)
return None
await rotator.report(proxy, success=True)
return await resp.text()
except Exception:
await rotator.report(proxy, success=False)
return None
Proxy Sources
Paid Providers (recommended for serious tasks):
- Bright Data — largest residential proxy pool, 72M+ IPs
- Oxylabs — good coverage for CIS region, ISP proxies
- Smartproxy — mobile and residential proxies, reasonable pricing
Free lists — unstable, not suitable for industrial scraping.
Timeline
Proxy rotation system with monitoring: 2–3 business days.







