Developing a Bot for Monitoring New Products at Competitors
New products at competitors signal purchasing, pricing, and SEO opportunities. If a competitor launches a new product line and you learn about it a week later — you've missed search positions and part of the audience that already chose another shop. The bot tracks appearance of new SKUs on competitor pages and immediately alerts the team.
How It Works
Monitoring new products differs from price monitoring: watch not specific URLs but catalog sections — categories, "New Products" pages, search results.
Configuration (catalog URL + selector) → Scraper → Snapshot → Diff → Alert
Algorithm:
- Download competitor category/section page
- Extract product list (URL + name + SKU)
- Compare with previous snapshot
- New items → send notification
Data Schema
CREATE TABLE competitor_catalogs (
id BIGSERIAL PRIMARY KEY,
competitor_id INT REFERENCES competitors(id),
url TEXT NOT NULL, -- category page URL
scrape_config JSONB NOT NULL, -- CSS selectors
check_interval INTERVAL DEFAULT '6 hours',
last_checked_at TIMESTAMP,
is_active BOOLEAN DEFAULT TRUE
);
-- Each product discovered at competitor
CREATE TABLE competitor_items (
id BIGSERIAL PRIMARY KEY,
catalog_id BIGINT REFERENCES competitor_catalogs(id),
external_url TEXT NOT NULL,
title TEXT,
external_sku VARCHAR(255),
price NUMERIC(12,2),
image_url TEXT,
first_seen_at TIMESTAMP DEFAULT NOW(),
last_seen_at TIMESTAMP DEFAULT NOW(),
is_new BOOLEAN DEFAULT TRUE, -- reset after notification
UNIQUE(catalog_id, external_url)
);
CREATE INDEX idx_competitor_items_new
ON competitor_items(catalog_id, first_seen_at)
WHERE is_new = TRUE;
Configuration via JSON
Each competitor is configured via JSONB config:
{
"pagination": {
"type": "url_param",
"param": "page",
"max_pages": 20
},
"item_selector": ".catalog-item",
"fields": {
"url": {"selector": "a.product-link", "attr": "href"},
"title": {"selector": ".product-name", "text": true},
"sku": {"selector": "[data-sku]", "attr": "data-sku"},
"price": {"selector": ".price", "text": true},
"image": {"selector": "img.product-image", "attr": "src"}
}
}
Catalog Scraper
class CatalogScraper
{
public function scrape(CompetitorCatalog $catalog): array
{
$config = $catalog->scrape_config;
$items = [];
$maxPages = $config['pagination']['max_pages'] ?? 1;
for ($page = 1; $page <= $maxPages; $page++) {
$url = $this->buildPageUrl($catalog->url, $config['pagination'], $page);
$html = $this->fetchWithRetry($url);
if (!$html) break;
$pageItems = $this->extractItems($html, $config);
if (empty($pageItems)) break; // Last page
$items = array_merge($items, $pageItems);
// Respectful delay between pages
usleep(rand(1_500_000, 3_000_000));
}
return $items;
}
private function extractItems(string $html, array $config): array
{
$crawler = new \Symfony\Component\DomCrawler\Crawler($html);
$items = [];
$crawler->filter($config['item_selector'])->each(function ($node) use ($config, &$items) {
$item = [];
foreach ($config['fields'] as $field => $fieldConfig) {
try {
$el = $node->filter($fieldConfig['selector'])->first();
if ($el->count() === 0) continue;
$item[$field] = isset($fieldConfig['attr'])
? $el->attr($fieldConfig['attr'])
: $el->text();
} catch (\Exception $e) {
continue;
}
}
if (!empty($item['url'])) {
$items[] = $item;
}
});
return $items;
}
}
New Product Detection Service
class NewProductDetectionService
{
public function process(CompetitorCatalog $catalog): DetectionResult
{
$scraped = $this->scraper->scrape($catalog);
$newItems = [];
foreach ($scraped as $item) {
$url = $this->normalizeUrl($item['url'], $catalog->url);
$existing = CompetitorItem::where([
'catalog_id' => $catalog->id,
'external_url' => $url,
])->first();
if (!$existing) {
// New product!
$created = CompetitorItem::create([
'catalog_id' => $catalog->id,
'external_url' => $url,
'title' => $item['title'] ?? null,
'external_sku' => $item['sku'] ?? null,
'price' => $this->parsePrice($item['price'] ?? ''),
'image_url' => $item['image'] ?? null,
'is_new' => true,
]);
$newItems[] = $created;
} else {
// Update last seen time
$existing->update(['last_seen_at' => now()]);
}
}
// Products that disappeared from competitor catalog
$disappeared = CompetitorItem::where('catalog_id', $catalog->id)
->where('last_seen_at', '<', now()->subDays(3))
->get();
$catalog->update(['last_checked_at' => now()]);
return new DetectionResult(newItems: $newItems, disappeared: $disappeared);
}
}
Telegram Notifications
class NewProductNotifier
{
public function notify(DetectionResult $result, CompetitorCatalog $catalog): void
{
if ($result->newItems->isEmpty()) return;
$lines = ["🆕 *New products at {$catalog->competitor->name}*\n"];
foreach ($result->newItems->take(10) as $item) {
$price = $item->price ? number_format($item->price, 0, '.', ' ') . ' rub.' : 'price not determined';
$lines[] = "• [{$item->title}]({$item->external_url}) — {$price}";
}
if ($result->newItems->count() > 10) {
$lines[] = "\n_...and " . ($result->newItems->count() - 10) . " more products_";
}
$this->telegram->sendMessage([
'chat_id' => config('telegram.new_products_chat'),
'text' => implode("\n", $lines),
'parse_mode' => 'Markdown',
'disable_web_page_preview' => true,
]);
// Reset is_new flag
CompetitorItem::whereIn('id', $result->newItems->pluck('id'))
->update(['is_new' => false]);
}
}
Bot Protection Bypass
Most large stores use bot protection. Bypass strategies:
| Protection | Bypass Method |
|---|---|
| Cloudflare Bot Management | Playwright + stealth plugin |
| Rate limiting | Random delays 2–5 sec between requests |
| IP blocking | Proxy rotation (residential proxies) |
| Require cookies/session | Headless browser with session preservation |
| JS rendering | Playwright/Puppeteer instead of curl |
Schedule
// Check every 6 hours for standard catalogs
$schedule->command('competitors:scan-catalogs')->everySixHours();
// Weekly summary report
$schedule->job(new WeeklyNewProductsReportJob)->weekly()->mondays()->at('08:00');
Additional Features
- Automatic addition to purchase list — found novelties go straight to buyer tasks
- Disappearance notification — product disappeared at competitor (discontinued, out of stock)
- Price comparison — if analogous item in our catalog, show price difference
Timeline
- CatalogScraper + JSON configuration: 1–2 days
- NewProductDetectionService + data schema: 1 day
- Telegram notifications: 0.5 days
- Playwright adapter for JS sites: +1 day
- Proxy rotation: +0.5 days
Total: 3–4 business days.







