Scraping Statistics Dashboard (Successes, Errors, Speed)

Our company is engaged in the development, support and maintenance of sites of any complexity. From simple one-page sites to large-scale cluster systems built on micro services. Experience of developers is confirmed by certificates from vendors.
Development and maintenance of all types of websites:
Informational websites or web applications
Business card websites, landing pages, corporate websites, online catalogs, quizzes, promo websites, blogs, news resources, informational portals, forums, aggregators
E-commerce websites or web applications
Online stores, B2B portals, marketplaces, online exchanges, cashback websites, exchanges, dropshipping platforms, product parsers
Business process management web applications
CRM systems, ERP systems, corporate portals, production management systems, information parsers
Electronic service websites or web applications
Classified ads platforms, online schools, online cinemas, website builders, portals for electronic services, video hosting platforms, thematic portals

These are just some of the technical types of websites we work with, and each of them can have its own specific features and functionality, as well as be customized to meet the specific needs and goals of the client.

Our competencies:
Development stages
Latest works
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822
  • image_crm_chasseurs_493_0.webp
    CRM development for Chasseurs
    847
  • image_website-sbh_0.png
    Website development for SBH Partners
    999
  • image_website-_0.png
    Website development for Red Pear
    451

Scraping Statistics Dashboard Implementation

Scraping statistics dashboard monitors health of data collection system. Shows: successful requests, blockages, speed per source, efficiency changes over time.

Key Metrics

Metric Description
Success Rate % successful requests per period
Requests/min Crawl speed
Items/hour Data collection speed
Error Rate % 4xx/5xx errors
Proxy Health % working proxies in pool
Queue Depth Length of URL queue
Avg Response Time Average source response time

Metrics Storage

Specialized tools more efficient than PostgreSQL for time series:

# TimescaleDB (PostgreSQL extension)
CREATE TABLE scraper_metrics (
    time          TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    scraper_id    INTEGER,
    requests_ok   INTEGER DEFAULT 0,
    requests_fail INTEGER DEFAULT 0,
    items_scraped INTEGER DEFAULT 0,
    avg_resp_ms   FLOAT,
    proxy_used    TEXT
);

SELECT create_hypertable('scraper_metrics', 'time');

Grafana Dashboard

{
  "dashboard": {
    "title": "Scraper Health",
    "panels": [
      {
        "title": "Success Rate %",
        "targets": [{
          "expr": "rate(scraper_requests_ok[1m]) / (rate(scraper_requests_ok[1m]) + rate(scraper_requests_fail[1m])) * 100"
        }]
      },
      {
        "title": "Queue Depth",
        "targets": [{
          "expr": "scraper_queue_depth"
        }]
      }
    ]
  }
}

Alerts

- alert: HighErrorRate
  expr: rate(scraper_requests_fail[5m]) / rate(scraper_requests_total[5m]) > 0.1
  for: 10m
  annotations:
    summary: "Scraper {{ $labels.scraper_id }} error rate high"

- alert: ProxyPoolDepleted
  expr: scraper_proxy_health < 0.3
  annotations:
    summary: "Less than 30% proxies working"

Timeline

Basic metrics collection—2–3 days. Dashboard with alerting—3–5 days.