Scraping Statistics Dashboard Implementation
Scraping statistics dashboard monitors health of data collection system. Shows: successful requests, blockages, speed per source, efficiency changes over time.
Key Metrics
| Metric | Description |
|---|---|
| Success Rate | % successful requests per period |
| Requests/min | Crawl speed |
| Items/hour | Data collection speed |
| Error Rate | % 4xx/5xx errors |
| Proxy Health | % working proxies in pool |
| Queue Depth | Length of URL queue |
| Avg Response Time | Average source response time |
Metrics Storage
Specialized tools more efficient than PostgreSQL for time series:
# TimescaleDB (PostgreSQL extension)
CREATE TABLE scraper_metrics (
time TIMESTAMPTZ NOT NULL DEFAULT NOW(),
scraper_id INTEGER,
requests_ok INTEGER DEFAULT 0,
requests_fail INTEGER DEFAULT 0,
items_scraped INTEGER DEFAULT 0,
avg_resp_ms FLOAT,
proxy_used TEXT
);
SELECT create_hypertable('scraper_metrics', 'time');
Grafana Dashboard
{
"dashboard": {
"title": "Scraper Health",
"panels": [
{
"title": "Success Rate %",
"targets": [{
"expr": "rate(scraper_requests_ok[1m]) / (rate(scraper_requests_ok[1m]) + rate(scraper_requests_fail[1m])) * 100"
}]
},
{
"title": "Queue Depth",
"targets": [{
"expr": "scraper_queue_depth"
}]
}
]
}
}
Alerts
- alert: HighErrorRate
expr: rate(scraper_requests_fail[5m]) / rate(scraper_requests_total[5m]) > 0.1
for: 10m
annotations:
summary: "Scraper {{ $labels.scraper_id }} error rate high"
- alert: ProxyPoolDepleted
expr: scraper_proxy_health < 0.3
annotations:
summary: "Less than 30% proxies working"
Timeline
Basic metrics collection—2–3 days. Dashboard with alerting—3–5 days.







