Marketplace Product Scraper Bot
Marketplace scraping (Ozon, Wildberries, Amazon) — different class compared to regular supplier site parsing. Ozon and Wildberries actively counter-act scraping: Cloudflare protection, dynamic JS, browser fingerprinting, rate limiting. Each marketplace requires a separate strategy.
Legal APIs vs scraping
Before scraping, check official options:
| Marketplace | Official API | Limitations |
|---|---|---|
| Ozon | Seller API (for sellers) | Own products only |
| Wildberries | Seller API, Statistics API | Own data only |
| Amazon | Product Advertising API | Partnership required |
Scraping competitor products — gray zone. Used for competitive analysis, price monitoring, market research.
Wildberries: JSON API parsing
# scraper/wildberries.py
import httpx
import asyncio
class WildberriesScraper:
CARD_URL = "https://card.wb.ru/cards/v2/detail"
async def get_product(self, nm_id: int):
params = {"appType": 1, "curr": "rub", "nm": nm_id}
resp = await self.client.get(self.CARD_URL, params=params)
data = resp.json()
return self._normalize(data.get("data", {}).get("products", [])[0])
async def search_products(self, query: str):
params = {"appType": 1, "query": query}
resp = await self.client.get(self.SEARCH_URL, params=params)
products = resp.json().get("data", {}).get("products", [])
return [self._normalize(p) for p in products]
Ozon: Playwright for SPA
# scraper/ozon.py
from playwright.async_api import async_playwright
class OzonScraper:
async def scrape_product(self, url: str):
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
product_data = {}
async def handle_response(response):
if "/api/entrypoint-api.bx/page/json" in response.url:
data = await response.json()
widget_states = data.get("widgetStates", {})
for key, value in widget_states.items():
if "webProductHeading" in key:
product_data["heading"] = json.loads(value)
page.on("response", handle_response)
await page.goto(url, wait_until="networkidle")
await browser.close()
return self._normalize_ozon(product_data)
Amazon via official API
For Amazon, use Product Advertising API 5.0:
# scraper/amazon_pa.py
from paapi5_python_sdk import DefaultApi, SearchItemsRequest
class AmazonScraper:
def search_products(self, keywords: str):
request = SearchItemsRequest(
partner_tag=self.partner_tag,
keywords=keywords,
resources=["ItemInfo.Title", "Offers.Listings.Price"],
)
response = self.api.search_items(request)
return [self._normalize(item) for item in response.search_result.items]
Development timeline
| Marketplace | Complexity | Timeline |
|---|---|---|
| Wildberries (JSON API) | Medium | 3-5 days |
| Ozon (Playwright) | High | 5-8 days |
| Amazon (PA API) | Low | 2-3 days |
Marketplace scraper with monitoring and alerts — 5-8 business days for 3-5 competitors.







