Automatic Product Mapping with Supplier Catalog

Our company is engaged in the development, support and maintenance of sites of any complexity. From simple one-page sites to large-scale cluster systems built on micro services. Experience of developers is confirmed by certificates from vendors.
Development and maintenance of all types of websites:
Informational websites or web applications
Business card websites, landing pages, corporate websites, online catalogs, quizzes, promo websites, blogs, news resources, informational portals, forums, aggregators
E-commerce websites or web applications
Online stores, B2B portals, marketplaces, online exchanges, cashback websites, exchanges, dropshipping platforms, product parsers
Business process management web applications
CRM systems, ERP systems, corporate portals, production management systems, information parsers
Electronic service websites or web applications
Classified ads platforms, online schools, online cinemas, website builders, portals for electronic services, video hosting platforms, thematic portals

These are just some of the technical types of websites we work with, and each of them can have its own specific features and functionality, as well as be customized to meet the specific needs and goals of the client.

Showing 1 of 1 servicesAll 2065 services
Automatic Product Mapping with Supplier Catalog
Complex
~5 business days
FAQ
Our competencies:
Development stages
Latest works
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822
  • image_crm_chasseurs_493_0.webp
    CRM development for Chasseurs
    847
  • image_website-sbh_0.png
    Website development for SBH Partners
    999
  • image_website-_0.png
    Website development for Red Pear
    451

Implementing Automatic Product Supplier Catalog Mapping

Supplier sends price with 50,000 items. Each has SKU, name and category in supplier system. Need to understand: which are already in site catalog, which need to be created, which are duplicates under different SKU. This is matching task (mapping), solved by combination of deterministic rules and fuzzy search.

Matching Levels

Mapping works layer by layer — from exact to approximate:

  1. Exact SKU match — most reliable way
  2. EAN/barcode match — if supplier provides it
  3. Normalized name match — after cleanup from extra characters
  4. Fuzzy match (Jaro-Winkler or Levenshtein) — for everything else
  5. Manual matching — for what wasn't found automatically
class ProductMatcher
{
    /** @return MatchResult */
    public function match(SupplierProduct $sp): MatchResult
    {
        // Level 1: exact SKU
        if ($p = Product::where('sku', $sp->article)->first()) {
            return MatchResult::exact($p->id, 'sku');
        }

        // Level 2: EAN
        if ($sp->ean && $p = Product::where('ean', $sp->ean)->first()) {
            return MatchResult::exact($p->id, 'ean');
        }

        // Level 3: normalized name
        $normalized = $this->normalize($sp->name);
        if ($p = Product::where('name_normalized', $normalized)->first()) {
            return MatchResult::exact($p->id, 'name_normalized');
        }

        // Level 4: fuzzy
        $candidate = $this->fuzzySearch($normalized);
        if ($candidate && $candidate->score >= 0.88) {
            return MatchResult::fuzzy($candidate->id, $candidate->score);
        }

        return MatchResult::unmatched();
    }
}

Name Normalization

Before comparing, bring strings to uniform format:

private function normalize(string $name): string
{
    $name = mb_strtolower($name);
    $name = preg_replace('/[\s\-\_\/]+/', ' ', $name);          // spaces
    $name = preg_replace('/[^\p{L}\p{N}\s]/u', '', $name);      // special chars
    $name = preg_replace('/\b(art|code|ref|no)\b\.?\s*/iu', '', $name); // service words
    return trim($name);
}

Normalized value stored in separate indexed field name_normalized — don't recalculate on each match.

Fuzzy Search via PostgreSQL

For fuzzy search in PostgreSQL connect pg_trgm extension:

CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX products_name_trgm_idx ON products USING gin (name_normalized gin_trgm_ops);

Query for similar:

SELECT id, name_normalized,
       similarity(name_normalized, :query) AS score
FROM products
WHERE similarity(name_normalized, :query) > 0.7
ORDER BY score DESC
LIMIT 5;

In PHP via Eloquent:

private function fuzzySearch(string $query): ?object
{
    return DB::selectOne(
        "SELECT id, similarity(name_normalized, ?) AS score
         FROM products
         WHERE similarity(name_normalized, ?) > 0.7
         ORDER BY score DESC
         LIMIT 1",
        [$query, $query]
    );
}

Storing Mapping

CREATE TABLE supplier_product_mapping (
    id              serial PRIMARY KEY,
    supplier_id     int NOT NULL,
    supplier_sku    varchar(100) NOT NULL,
    product_id      int REFERENCES products(id),
    match_type      varchar(20),    -- exact_sku | exact_ean | fuzzy | manual | new
    match_score     float,          -- for fuzzy
    confirmed       boolean DEFAULT false,
    confirmed_by    int,            -- user_id
    confirmed_at    timestamptz,
    created_at      timestamptz DEFAULT now(),
    UNIQUE (supplier_id, supplier_sku)
);

Confirmed mappings (confirmed = true) used directly. Unconfirmed fuzzy mappings require operator review.

Unmatched Items Processing Workflow

MatchResult::unmatched()
  └─> check: similar product exists but score < 0.88?
        ├─> YES → create mapping (confirmed=false) + notify operator
        └─> NO  → create draft product or skip

Operator in admin sees list of confirmed=false with suggested candidates and "Accept" / "Reject" / "Find Another" buttons.

Supplier Category Mapping

Supplier uses own categories — need to map to site category tree:

CREATE TABLE supplier_category_mapping (
    supplier_id          int,
    supplier_category    varchar(200),
    site_category_id     int REFERENCES categories(id),
    created_at           timestamptz DEFAULT now(),
    PRIMARY KEY (supplier_id, supplier_category)
);

After initial manual category mapping, new arriving products automatically go to correct site category.

Duplicate Detection

Separate task — find items under different supplier SKUs but same product in reality:

class DuplicateDetector
{
    public function findDuplicatePairs(int $supplierId): array
    {
        // Find items with same EAN from one supplier
        return DB::select(
            "SELECT a.supplier_sku AS sku_a, b.supplier_sku AS sku_b, a.ean
             FROM supplier_products a
             JOIN supplier_products b ON a.ean = b.ean AND a.supplier_sku < b.supplier_sku
             WHERE a.supplier_id = ? AND b.supplier_id = ?",
            [$supplierId, $supplierId]
        );
    }
}

Performance with Large Catalogs

With 100,000+ items, full fuzzy pass is too slow. Optimizations:

  • First batch exact match query (single SQL with IN (...))
  • Fuzzy search only for remaining unmatched
  • Cache sku → product_id mapping in Redis before processing
  • Partition processing into Queue chunks
// Pre-load known mappings into memory
$knownMappings = SupplierProductMapping::where([
    'supplier_id' => $supplierId,
    'confirmed'   => true,
])->pluck('product_id', 'supplier_sku')->all();
// Later — O(1) lookup by SKU

Implementation Timeline

  • Exact match by SKU/EAN, mapping storage, new drafts — 3 days
  • Normalization + fuzzy via pg_trgm + confirmation queue — +2 days
  • Category mapping, duplicate detector, admin UI for review — +2–3 days