Parsing product characteristics for filling 1C-Bitrix

Our company is engaged in the development, support and maintenance of Bitrix and Bitrix24 solutions of any complexity. From simple one-page sites to complex online stores, CRM systems with 1C and telephony integration. The experience of developers is confirmed by certificates from the vendor.
Our competencies:
Development stages

Parsing Product Specifications to Populate 1C-Bitrix

Specifications are the foundation of catalog filtering. Without correctly populated properties (b_iblock_element_property), the 1C-Bitrix smart filter does not work, faceted search returns zero results, and customers cannot find the products they need. Parsing specifications is technically more complex than parsing descriptions: the goal is not simply to extract text but to recognize the "parameter name — value" structure and place the data correctly into infoblock properties.

Formats of specifications on source sites

Specification tables come in several forms:

HTML table (<table>) — the classic format, parsed via XPath //table//tr. The first cell in a row is the name, the second is the value.

dl/dt/dd list — commonly used in modern stores. Parse dt+dd pairs.

JSON-LD or schema.org microdata — the ideal format. Data is already structured; no need to parse HTML:

preg_match('/<script type="application\/ld\+json">(.*?)<\/script>/s', $html, $m);
$data = json_decode($m[1], true);

JS variables — data in window.productData or __REDUX_STATE__. Extract via regex.

Normalizing specification names

Different sources name the same attribute differently: "Weight", "Net mass", "Weight (kg)". Direct mapping to an infoblock property without normalization creates chaos.

Solution: an alias table property_aliases:

CREATE TABLE parser_property_aliases (
    alias VARCHAR(255),
    canonical_name VARCHAR(255),
    property_code VARCHAR(100)
);

During parsing, every found name is looked up in the alias table. If not found — log it as an "unknown property" for manual review and addition to the dictionary.

Mapping to infoblock properties

Infoblock properties (b_iblock_property) have types: S (string), N (number), L (list), E (element link). For specifications, typically:

  • S — text values ("Color: red")
  • N — numeric values with a unit of measurement (PROPERTY_TYPE = N, USER_TYPE empty)
  • L — fixed list of values (important for filter performance)

For the 1C-Bitrix smart filter (bitrix:catalog.smart.filter), L type values perform faster than S — they are indexed in b_iblock_element_prop_enum.

Creating an L type value during import:

// Get or create an enum value
$propEnum = CIBlockPropertyEnum::GetList([], [
    'PROPERTY_ID' => $propId,
    'VALUE' => $parsedValue
])->Fetch();
if (!$propEnum) {
    CIBlockPropertyEnum::Add(['PROPERTY_ID' => $propId, 'VALUE' => $parsedValue]);
}

Handling units of measurement

Sources provide "10 kg", "10kg", "10 kilograms". A unit parser is needed: split the number from the unit, normalize the unit to a standard form. Simple regex: /^([\d.,]+)\s*(.*)$/.

Numeric property values in 1C-Bitrix are stored as strings in b_iblock_element_property.VALUE — it is better to put units in a separate property or include them in the property CODE (WEIGHT_KG).

Case study: electronics, 15,000 SKUs, 120+ specification types

Goal: populate properties for the smart filter across laptops, phones, and TVs — three infoblocks with different property sets.

Implementation:

  • Parsing from the manufacturer's site via JSON-LD (70% of products) + HTML table (30%)
  • Alias dictionary of 380 entries, built during the first 3 days of development
  • All numeric specifications — type N, list-based values (brand, color, country) — type L
  • 5 workers running in parallel via PHP-CLI, each handling its own category

Result: the smart filter worked correctly across 48 parameters after 2 iterations of alias dictionary debugging.

Work timeline

Phase Duration
Analyzing the source's specification structure 4–8 hours
Developing the specification parser 2–3 days
Building the alias dictionary, normalization 1–2 days
Configuring infoblock properties for the filter 1 day
Importing data, debugging property types 1–2 days
Verifying smart filter functionality 4–8 hours

Total: 7–12 working days — this is one of the most labor-intensive catalog population tasks.