Parsing products from competitors' websites for 1C-Bitrix

Our company is engaged in the development, support and maintenance of Bitrix and Bitrix24 solutions of any complexity. From simple one-page sites to complex online stores, CRM systems with 1C and telephony integration. The experience of developers is confirmed by certificates from the vendor.
Our competencies:
Development stages

Parsing Products from Competitor Websites for 1C-Bitrix

An online store without current competitor data loses ground on price and assortment. Manual monitoring of 500+ product lines is not cost-effective. The solution is a parser that collects product data from competitor websites and loads it into a 1C-Bitrix infoblock for subsequent analysis or automated responses.

Solution architecture

A competitor parser is a separate module that does not interact with the main catalog directly. Typical scheme:

  1. Collector — a PHP/Python script using Guzzle or Symfony HttpClient, crawling the competitor's pages
  2. Intermediate storage — a dedicated infoblock COMPETITORS_CATALOG or a database table
  3. Analytics layer — comparison against b_catalog_price of your own store
  4. Action trigger — updating prices via CCatalogProduct::Update() or sending a notification to the manager

Storing competitor data directly in the main catalog is bad practice: it pollutes b_iblock_element and breaks search indexes.

Technical challenges and how to handle them

Anti-scraping protection is the primary obstacle. Large stores use Cloudflare, dynamic JS rendering (React/Vue SPA), and CAPTCHAs. Simple curl with User-Agent rotation works against static sites. Against JS-rendered pages, a headless browser is required: Puppeteer via Node.js or Playwright.

Stack for JS-rendered sites:

Playwright → stdout JSON → PHP reads via exec() → CIBlockElement::Add()

Unstable HTML structure — the competitor changed their markup and the parser crashed. Solution: CSS selectors instead of XPath for flat structures, and mandatory monitoring with an alert when the selection returns zero results.

IP blocking — rotation through a proxy pool (residential proxies). A minimum of 10–15 IPs in the pool for a catalog of 1,000+ items. Request frequency: no more than 1 request per 3–5 seconds per domain.

What to collect

Typical data set for competitor parsing:

  • Product name and SKU
  • Current price (regular + discounted)
  • Availability / quantity
  • Link to the source product
  • Date of last update

In 1C-Bitrix, this maps to infoblock properties. I recommend adding a COMPETITOR_URL property of type "String" and COMPETITOR_PRICE_DATE of type "Date" to track data freshness.

Case study: electronics store, 3 competitors

Goal: monitor prices for 2,400 SKUs across three competitors, update every 6 hours.

Implementation:

  • PHP + Guzzle parser for two competitors with static HTML
  • Puppeteer for the third (JS-SPA on Vue)
  • Cron every 6 hours, sequential runs per competitor with a 2-hour gap between them
  • COMPETITORS_PRICES infoblock linked to the main catalog via XML_ID
  • A 1C-Bitrix agent runs the comparison and generates a report in a Highload block

Result: response time to a competitor's price change — 6 hours instead of manual weekly monitoring. The manager receives a summary of deviations greater than 5% by email via the \Bitrix\Main\Mail\Event module.

Work timeline

Phase Duration
Analyzing competitor sites, selecting technology 4–8 hours
Developing the parser (1 competitor, static HTML) 1–2 days
Developing the parser with headless browser 2–3 days
Integration with 1C-Bitrix infoblock 1 day
Setting up cron, monitoring, alerts 4–6 hours
Testing on real data 1 day

Total for 3 competitors with different technologies — 5–8 working days. Ongoing parser maintenance after launch is mandatory: competitor markup changes over time.