Implementing Alerts on Scraping Failures (email/Telegram)
Scraper crashed at night — by morning data is stale, and nobody knows why. Alert system solves this: the right person gets notification at moment of failure, with enough context for diagnosis.
What Counts as Failure
Not every error needs alert. Single timeout — normal situation, worker will retry. Alert needed when:
- Task exhausted all retries (moved to DLQ / failed finally)
- Worker crashed itself (process crash, OOM)
- Error percentage over last 15 minutes exceeded threshold (e.g., > 20%)
- Scraping site didn't complete in expected time (watchdog timeout)
- Page structure changed — parser returns empty data
Telegram Notification
import httpx
import textwrap
async def send_telegram_alert(bot_token: str, chat_id: str, event: dict):
text = textwrap.dedent(f"""
🔴 <b>Scraping Failure</b>
<b>Site:</b> {event['site_name']}
<b>URL:</b> <code>{event['url']}</code>
<b>Error:</b> {event['error_type']}
<b>Message:</b> <code>{event['error_message'][:300]}</code>
<b>Attempts:</b> {event['attempts']}
<b>Time:</b> {event['timestamp']}
""").strip()
async with httpx.AsyncClient() as client:
await client.post(
f"https://api.telegram.org/bot{bot_token}/sendMessage",
json={"chat_id": chat_id, "text": text, "parse_mode": "HTML"},
timeout=10,
)
Email via SMTP / SendGrid
from sendgrid import SendGridAPIClient
from sendgrid.helpers.mail import Mail
def send_email_alert(to_email: str, event: dict):
message = Mail(
from_email='[email protected]',
to_emails=to_email,
subject=f"[Scraping] Failure: {event['site_name']}",
html_content=render_alert_template(event),
)
sg = SendGridAPIClient(api_key=SENDGRID_API_KEY)
sg.send(message)
Alert Deduplication
Without deduplication on mass failure (proxy provider crashed) 500 emails arrive in a minute. Solution — grouping by key with cooldown:
def should_send_alert(site_id: int, error_type: str, cooldown_minutes: int = 30) -> bool:
key = f"alert_sent:{site_id}:{error_type}"
if redis.exists(key):
return False
redis.setex(key, cooldown_minutes * 60, "1")
return True
One alert per error type in 30 minutes — reasonable balance between informativeness and noise.
Implementation Timeline
Telegram + email alerts with deduplication — 1–2 business days.







