Implementing REST API for Scraper Bot Management
When managing multiple scrapers programmatically — starting, stopping, changing configuration, monitoring status — a REST API is created. This enables integrating scraper management into any external system: internal dashboard, CI/CD, third-party services.
API Structure
POST /api/v1/scrapers — create new scraper
GET /api/v1/scrapers — list all scrapers
GET /api/v1/scrapers/{id} — scraper configuration
PATCH /api/v1/scrapers/{id} — update configuration
DELETE /api/v1/scrapers/{id} — delete scraper
POST /api/v1/scrapers/{id}/run — run immediately
POST /api/v1/scrapers/{id}/stop — stop running scraper
GET /api/v1/scrapers/{id}/status — current status
GET /api/v1/scrapers/{id}/runs — run history
GET /api/v1/scrapers/{id}/runs/{runId} — run details
GET /api/v1/scrapers/{id}/results — parsing results
FastAPI Implementation
from fastapi import FastAPI, HTTPException, BackgroundTasks
from pydantic import BaseModel
from typing import Optional
app = FastAPI()
class ScraperConfig(BaseModel):
name: str
url: str
schedule: Optional[str] = None # cron expression
proxy_pool: Optional[str] = None
rate_limit: int = 5 # req/sec
headers: dict = {}
@app.post('/api/v1/scrapers', status_code=201)
async def create_scraper(config: ScraperConfig):
scraper = await ScraperRepository.create(config.dict())
if config.schedule:
await Scheduler.register(scraper.id, config.schedule)
return scraper
@app.post('/api/v1/scrapers/{scraper_id}/run')
async def run_scraper(scraper_id: int, background_tasks: BackgroundTasks):
scraper = await ScraperRepository.get_or_404(scraper_id)
if scraper.status == 'running':
raise HTTPException(409, 'Scraper is already running')
run = await ScraperRun.create(scraper_id=scraper_id, status='pending')
background_tasks.add_task(execute_scraper, scraper, run.id)
return {'run_id': run.id, 'status': 'started'}
@app.get('/api/v1/scrapers/{scraper_id}/status')
async def get_status(scraper_id: int):
scraper = await ScraperRepository.get_or_404(scraper_id)
last_run = await ScraperRun.get_latest(scraper_id)
return {
'id': scraper_id,
'status': last_run.status if last_run else 'idle',
'last_run': last_run.started_at if last_run else None,
'items_count': last_run.items_collected if last_run else 0,
}
Authentication
API keys with access levels: read, write, admin. Keys are stored as hashes (bcrypt) and passed in the Authorization: Bearer {key} header.
Webhooks
Subscribe to events: run completion, errors, new data:
@app.post('/api/v1/webhooks')
async def create_webhook(url: str, events: list[str]):
# events: ['run.completed', 'run.failed', 'data.new']
return await WebhookRepository.create(url=url, events=events)
Timeline
REST API for scraper management with authentication and webhooks: 5–8 business days.







