Parsing Products from Competitor Websites for 1C-Bitrix
An online store without current competitor data loses ground on price and assortment. Manual monitoring of 500+ product lines is not cost-effective. The solution is a parser that collects product data from competitor websites and loads it into a 1C-Bitrix infoblock for subsequent analysis or automated responses.
Solution architecture
A competitor parser is a separate module that does not interact with the main catalog directly. Typical scheme:
- Collector — a PHP/Python script using Guzzle or Symfony HttpClient, crawling the competitor's pages
-
Intermediate storage — a dedicated infoblock
COMPETITORS_CATALOGor a database table -
Analytics layer — comparison against
b_catalog_priceof your own store -
Action trigger — updating prices via
CCatalogProduct::Update()or sending a notification to the manager
Storing competitor data directly in the main catalog is bad practice: it pollutes b_iblock_element and breaks search indexes.
Technical challenges and how to handle them
Anti-scraping protection is the primary obstacle. Large stores use Cloudflare, dynamic JS rendering (React/Vue SPA), and CAPTCHAs. Simple curl with User-Agent rotation works against static sites. Against JS-rendered pages, a headless browser is required: Puppeteer via Node.js or Playwright.
Stack for JS-rendered sites:
Playwright → stdout JSON → PHP reads via exec() → CIBlockElement::Add()
Unstable HTML structure — the competitor changed their markup and the parser crashed. Solution: CSS selectors instead of XPath for flat structures, and mandatory monitoring with an alert when the selection returns zero results.
IP blocking — rotation through a proxy pool (residential proxies). A minimum of 10–15 IPs in the pool for a catalog of 1,000+ items. Request frequency: no more than 1 request per 3–5 seconds per domain.
What to collect
Typical data set for competitor parsing:
- Product name and SKU
- Current price (regular + discounted)
- Availability / quantity
- Link to the source product
- Date of last update
In 1C-Bitrix, this maps to infoblock properties. I recommend adding a COMPETITOR_URL property of type "String" and COMPETITOR_PRICE_DATE of type "Date" to track data freshness.
Case study: electronics store, 3 competitors
Goal: monitor prices for 2,400 SKUs across three competitors, update every 6 hours.
Implementation:
- PHP + Guzzle parser for two competitors with static HTML
- Puppeteer for the third (JS-SPA on Vue)
- Cron every 6 hours, sequential runs per competitor with a 2-hour gap between them
-
COMPETITORS_PRICESinfoblock linked to the main catalog viaXML_ID - A 1C-Bitrix agent runs the comparison and generates a report in a Highload block
Result: response time to a competitor's price change — 6 hours instead of manual weekly monitoring. The manager receives a summary of deviations greater than 5% by email via the \Bitrix\Main\Mail\Event module.
Work timeline
| Phase | Duration |
|---|---|
| Analyzing competitor sites, selecting technology | 4–8 hours |
| Developing the parser (1 competitor, static HTML) | 1–2 days |
| Developing the parser with headless browser | 2–3 days |
| Integration with 1C-Bitrix infoblock | 1 day |
| Setting up cron, monitoring, alerts | 4–6 hours |
| Testing on real data | 1 day |
Total for 3 competitors with different technologies — 5–8 working days. Ongoing parser maintenance after launch is mandatory: competitor markup changes over time.

