Implementing Scraped Data Export (CSV/JSON/API)
Scraped data is needed in different formats: analytics teams need CSV for Excel, developers need JSON via API, integrations with other systems need webhooks or direct endpoint access.
CSV Export
For large volumes (>10k rows) we don't generate the entire file in memory — we stream it:
import csv
import io
from flask import Response, stream_with_context
def export_csv(site_id: int, filters: dict):
def generate():
output = io.StringIO()
writer = csv.DictWriter(output, fieldnames=[
'id', 'name', 'price', 'currency', 'url', 'in_stock', 'scraped_at'
])
writer.writeheader()
yield output.getvalue()
output.truncate(0); output.seek(0)
for product in stream_products(site_id, filters):
writer.writerow(product)
yield output.getvalue()
output.truncate(0); output.seek(0)
return Response(
stream_with_context(generate()),
mimetype='text/csv',
headers={'Content-Disposition': 'attachment; filename=export.csv'},
)
JSON API
REST endpoint with filtering, pagination and field selection:
GET /api/v1/scraped-products?site_id=7&in_stock=true&fields=name,price,url&page=1&per_page=100
{
"data": [
{ "name": "Nike Air Max Sneakers", "price": 89.99, "url": "https://..." }
],
"meta": {
"page": 1,
"per_page": 100,
"total": 4823
}
}
Authentication — API key in the X-API-Key header. Access is limited by IP and rate limiting (100 requests/min).
Webhook Notification
After parsing completion or data update, the system sends a POST request to the client's webhook URL:
import httpx, hashlib, hmac
def send_webhook(url: str, secret: str, payload: dict):
body = json.dumps(payload).encode()
sig = hmac.new(secret.encode(), body, hashlib.sha256).hexdigest()
httpx.post(url, content=body, headers={
'Content-Type': 'application/json',
'X-Signature-SHA256': f'sha256={sig}',
}, timeout=10)
The signature allows the recipient to verify the authenticity of the request.
Implementation Timeline
CSV and JSON API with basic authentication — 1–2 business days. Webhooks with signatures, filters, field selection — additional 1 day.







