Price Aggregator (Price Comparison) Development

Our company is engaged in the development, support and maintenance of sites of any complexity. From simple one-page sites to large-scale cluster systems built on micro services. Experience of developers is confirmed by certificates from vendors.
Development and maintenance of all types of websites:
Informational websites or web applications
Business card websites, landing pages, corporate websites, online catalogs, quizzes, promo websites, blogs, news resources, informational portals, forums, aggregators
E-commerce websites or web applications
Online stores, B2B portals, marketplaces, online exchanges, cashback websites, exchanges, dropshipping platforms, product parsers
Business process management web applications
CRM systems, ERP systems, corporate portals, production management systems, information parsers
Electronic service websites or web applications
Classified ads platforms, online schools, online cinemas, website builders, portals for electronic services, video hosting platforms, thematic portals

These are just some of the technical types of websites we work with, and each of them can have its own specific features and functionality, as well as be customized to meet the specific needs and goals of the client.

Showing 1 of 1 servicesAll 2065 services
Price Aggregator (Price Comparison) Development
Complex
from 2 weeks to 3 months
FAQ
Our competencies:
Development stages
Latest works
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822
  • image_crm_chasseurs_493_0.webp
    CRM development for Chasseurs
    847
  • image_website-sbh_0.png
    Website development for SBH Partners
    999
  • image_website-_0.png
    Website development for Red Pear
    451

Price Aggregator Development

Price aggregator collects prices on identical products from different stores and shows together. User sees cheapest and clicks. Technically: parsing, data normalization, product matching. Each stage non-trivial at scale.

Data Sources

Data arrives three ways:

Price Lists and Feeds — store provides YML, XML, CSV with current assortment. Most reliable: structured, official partnership, no ban risk. Yandex.Market YML — de-facto standard for Russian market.

Partner APIs — some stores provide REST API. Documentation weak, request limits strict.

Web Parsing — for feedless stores. High risk: captcha, rate limiting, markup changes, IP blocking. Constant maintenance required.

Start with feeds and API — more stable. Parse selectively for key sources.

Data Collector Architecture

Scheduler (Celery Beat / Laravel Scheduler)
  ↓ every N hours
FeedFetcher workers (one per source)
  ↓
RawData storage (S3 or local FS)
  ↓
Parser workers (XML/CSV/JSON → normalized)
  ↓
Normalizer (unit conversion, text cleanup)
  ↓
Matcher (map to DB products)
  ↓
PriceHistory (timeseries write)
  ↓
ElasticsearchIndexer (update index)

Queue: Celery + Redis for Python, Laravel Horizon for PHP. Each feed independent, source error doesn't block others.

Product Matching

Hardest part. Task: determine Samsung Galaxy A55 128GB Blue from shop A and Smartphone Samsung Galaxy A55 (SM-A556B) 128 Гб синий from shop B are same.

Deterministic:

  • GTIN/EAN: if both have barcode — exact match
  • MPN: manufacturer sku unique per brand
  • URL canonicalization: some stores include GTIN in URL

Fuzzy:

from rapidfuzz import fuzz

def match_score(title_a: str, title_b: str, brand_a: str, brand_b: str) -> float:
    if brand_a.lower() != brand_b.lower():
        return 0.0
    title_similarity = fuzz.token_sort_ratio(title_a, title_b)
    return title_similarity / 100

Threshold: 0.85+ auto-match, 0.65–0.85 manual review, below new product.

ML approach: product name embeddings (sentence-transformers, ruBERT) + cosine similarity. Much accurate especially for different formulations. Model trained on confirmed matches.

Price History

Main value — not current price but change history. Each price change recorded, not overwritten.

price_history (
  id BIGSERIAL,
  source_offer_id BIGINT,
  price NUMERIC(12,2),
  in_stock BOOLEAN,
  recorded_at TIMESTAMPTZ DEFAULT NOW()
)

For PostgreSQL timeseries use TimescaleDB — extension auto-partitions by time, speeds queries. Alternative — InfluxDB, ClickHouse for high loads.

Graph — standard product page component. Chart.js or Recharts, aggregate by day: SELECT date_trunc('day', recorded_at), min(price) FROM price_history.

SEO Strategy

Aggregators generate organic traffic on product pages. Key queries: "[product name] buy", "[product name] price", "[product name] cheap".

  • Each canonical product page: unique title with price range
  • Structured data: Product + AggregateOffer with lowPrice, highPrice, offerCount
  • Static category pages with aggregated stats
  • Blog reviews and curations — long-term SEO traffic

Timeline

  • MVP: feeds from 3–5 sources, manual matching, product pages, basic search — 8–12 weeks
  • Full aggregator: automatic matching (fuzzy + ML), price graphs, store cabinet, partner tracking — 20–30 weeks
  • Each new source (parsing): 3–7 work days depending complexity

Aggregator requires operational support: sources change structure, products need re-matching, new stores connect. Not one-off project but platform with support team.