Automatic Product Matching from Different Suppliers

Our company is engaged in the development, support and maintenance of sites of any complexity. From simple one-page sites to large-scale cluster systems built on micro services. Experience of developers is confirmed by certificates from vendors.
Development and maintenance of all types of websites:
Informational websites or web applications
Business card websites, landing pages, corporate websites, online catalogs, quizzes, promo websites, blogs, news resources, informational portals, forums, aggregators
E-commerce websites or web applications
Online stores, B2B portals, marketplaces, online exchanges, cashback websites, exchanges, dropshipping platforms, product parsers
Business process management web applications
CRM systems, ERP systems, corporate portals, production management systems, information parsers
Electronic service websites or web applications
Classified ads platforms, online schools, online cinemas, website builders, portals for electronic services, video hosting platforms, thematic portals

These are just some of the technical types of websites we work with, and each of them can have its own specific features and functionality, as well as be customized to meet the specific needs and goals of the client.

Showing 1 of 1 servicesAll 2065 services
Automatic Product Matching from Different Suppliers
Complex
~2-4 weeks
FAQ
Our competencies:
Development stages
Latest works
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822
  • image_crm_chasseurs_493_0.webp
    CRM development for Chasseurs
    847
  • image_website-sbh_0.png
    Website development for SBH Partners
    999
  • image_website-_0.png
    Website development for Red Pear
    451

Implementing Automatic Product Matching Across Different Suppliers

Matching is the task of establishing correspondence between items from different suppliers that describe one physical product. Unlike deduplication (eliminating obvious duplicates in one catalog), matching works with initially different nomenclature systems. Supplier A calls a product "Smartphone Samsung S24 256GB", supplier B calls it "SAMSUNG Galaxy S24 (256 GB) Black SM-S921B". It's one product, but without matching, the system creates two cards.

Matching Methods

Methods are used sequentially — from strict to soft:

1. Exact Identifiers

  • EAN/GTIN — most reliable, covers 40–60% of electronics
  • Manufacturer part number (MPN) + brand — additional 20–30%
  • ISBN for books, ASIN for Amazon-compatible catalogs

2. Structured Attributes

  • Brand + model + key characteristics (capacity, color, size)
  • Works for standardized categories (electronics, clothing)

3. Fuzzy Text

  • Jaro-Winkler / Levenshtein on normalized names
  • TF-IDF + cosine similarity on descriptions
  • Covers non-standardized categories

4. Vector Matching (ML)

  • Embeddings via sentence-transformers or OpenAI API
  • Effective for products with complex descriptions

Data Schema

-- Matching table
CREATE TABLE product_matches (
    id              BIGSERIAL PRIMARY KEY,
    master_id       BIGINT NOT NULL REFERENCES products(id),
    supplier_id     INT NOT NULL REFERENCES suppliers(id),
    supplier_sku    VARCHAR(255) NOT NULL,
    match_method    VARCHAR(30) NOT NULL,   -- 'gtin', 'mpn_brand', 'fuzzy', 'ml', 'manual'
    confidence      FLOAT,                  -- 0.0–1.0
    status          VARCHAR(20) DEFAULT 'active', -- 'active', 'rejected', 'pending_review'
    created_at      TIMESTAMP DEFAULT NOW(),
    UNIQUE(supplier_id, supplier_sku)
);

CREATE INDEX idx_matches_master ON product_matches(master_id);
CREATE INDEX idx_matches_confidence ON product_matches(confidence) WHERE status = 'pending_review';

Matching Pipeline

class ProductMatchingPipeline
{
    private array $matchers = [];

    public function __construct(
        private GtinMatcher      $gtinMatcher,
        private MpnBrandMatcher  $mpnBrandMatcher,
        private FuzzyMatcher     $fuzzyMatcher,
        private VectorMatcher    $vectorMatcher,
    ) {
        $this->matchers = [
            ['matcher' => $gtinMatcher,     'threshold' => 1.0,  'auto_accept' => true],
            ['matcher' => $mpnBrandMatcher,  'threshold' => 1.0,  'auto_accept' => true],
            ['matcher' => $fuzzyMatcher,     'threshold' => 0.90, 'auto_accept' => true],
            ['matcher' => $vectorMatcher,    'threshold' => 0.85, 'auto_accept' => false],
        ];
    }

    public function match(SupplierProductDTO $dto): MatchResult
    {
        foreach ($this->matchers as $config) {
            $result = $config['matcher']->find($dto);

            if (!$result) continue;

            if ($result->confidence >= $config['threshold'] && $config['auto_accept']) {
                return new MatchResult(
                    masterProductId: $result->productId,
                    confidence:      $result->confidence,
                    method:          $result->method,
                    status:          'active',
                );
            }

            if ($result->confidence >= 0.70) {
                // Send to manual review queue
                return new MatchResult(
                    masterProductId: $result->productId,
                    confidence:      $result->confidence,
                    method:          $result->method,
                    status:          'pending_review',
                );
            }
        }

        // Not found — create new master product
        return new MatchResult(masterProductId: null, confidence: 0, method: 'none', status: 'new');
    }
}

GTIN Matcher

class GtinMatcher
{
    public function find(SupplierProductDTO $dto): ?MatchCandidate
    {
        if (!$dto->barcode) return null;

        $normalized = $this->normalizeGtin($dto->barcode);

        // Search in known fingerprints
        $fingerprint = ProductFingerprint::where('type', 'gtin')
            ->where('value', $normalized)
            ->first();

        if (!$fingerprint) return null;

        return new MatchCandidate(
            productId:  $fingerprint->product_id,
            confidence: 1.0,
            method:     'gtin',
        );
    }

    private function normalizeGtin(string $raw): string
    {
        $digits = preg_replace('/\D/', '', $raw);
        // EAN-8 → EAN-13
        if (strlen($digits) === 8) {
            $digits = str_pad($digits, 13, '0', STR_PAD_LEFT);
        }
        return $digits;
    }
}

Vector Matcher via OpenAI Embeddings

class VectorMatcher
{
    public function find(SupplierProductDTO $dto): ?MatchCandidate
    {
        $text = $this->buildText($dto);

        // Get embedding for new product
        $vector = $this->openai->embeddings()->create([
            'model' => 'text-embedding-3-small',
            'input' => $text,
        ])->embeddings[0]->embedding;

        // Find nearest neighbor in pgvector
        $result = DB::selectOne("
            SELECT product_id, 1 - (embedding <=> :vec) AS similarity
            FROM product_embeddings
            WHERE 1 - (embedding <=> :vec) > 0.80
            ORDER BY embedding <=> :vec
            LIMIT 1
        ", ['vec' => '[' . implode(',', $vector) . ']']);

        if (!$result) return null;

        return new MatchCandidate(
            productId:  $result->product_id,
            confidence: (float) $result->similarity,
            method:     'vector',
        );
    }

    private function buildText(SupplierProductDTO $dto): string
    {
        return implode(' ', array_filter([
            $dto->brand,
            $dto->name,
            $dto->sku,
            implode(' ', array_values($dto->attributes)),
        ]));
    }
}

PostgreSQL requires the pgvector extension:

CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE product_embeddings (
    product_id  BIGINT PRIMARY KEY REFERENCES products(id),
    embedding   vector(1536),  -- OpenAI text-embedding-3-small
    updated_at  TIMESTAMP
);
CREATE INDEX idx_embeddings_cosine ON product_embeddings
    USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

Manual Review Interface

Products with pending_review status go to moderator queue. Interface shows:

  • Left — supplier product (name, SKU, photo)
  • Right — catalog candidate with match percentage
  • Buttons: Confirm, Reject, Find Another
  • Hotkeys for speed (→ accept, ← reject)

An experienced moderator processes 100–150 pairs per hour.

Feedback Loop for Model Improvement

Each moderator decision is a training example:

class MatchFeedbackService
{
    public function recordDecision(int $matchId, string $decision, int $userId): void
    {
        $match = ProductMatch::findOrFail($matchId);

        $match->update([
            'status'      => $decision === 'accept' ? 'active' : 'rejected',
            'reviewed_by' => $userId,
        ]);

        // Save for retraining
        MatchTrainingExample::create([
            'supplier_product_data' => $match->supplierProduct->toArray(),
            'master_product_id'     => $match->master_id,
            'label'                 => $decision === 'accept' ? 1 : 0,
            'confidence_was'        => $match->confidence,
        ]);

        // If rejected — create new master product
        if ($decision === 'reject') {
            $this->createNewMaster($match->supplierProduct);
        }
    }
}

Performance

With 100,000+ position catalogs, matching can't run by iterating all pairs. Optimizations:

  • Blocking: first filter candidates by brand/category, then match only within block
  • Batch embeddings: request vectors in batches of 100 from OpenAI
  • pgvector IVFFlat index: approximate nearest neighbor in milliseconds

Timeline

  • GtinMatcher + MpnBrandMatcher + FuzzyMatcher: 2 days
  • VectorMatcher + pgvector: 2 days
  • Pipeline + manual review queue + interface: 2–3 days
  • Feedback loop + metrics: 1 day

Total: 7–8 business days.