Implementing Automatic Product Image Download and Upload
During bulk product import, images are the most voluminous and labor-intensive part. Supplier sends links or paths in price list, store must download, optimize and save files in own storage. Doing this manually with 1,000+ items catalog is unrealistic.
Where Image Links Come From
-
In CSV/Excel — column with URL or relative path:
https://supplier.ru/images/ABC-123_1.jpg -
In XML/YML —
<picture>or<image>tags -
In API response — array
images: [{url, sort, is_main}] - On FTP supplier — files in directory, filename = article
Download Pipeline Architecture
Import Job
└─> parse product data
└─> enqueue ImageDownloadJob(sku, urls[])
└─> download each URL (HTTP)
└─> validate (mime, size)
└─> optimize (resize, convert to WebP)
└─> upload to storage (S3 / local)
└─> save to product_images table
Image download is in separate Job to not block main import.
Download with Retry
class ImageDownloadJob implements ShouldQueue
{
use Dispatchable, InteractsWithQueue;
public int $tries = 3;
public int $backoff = 30;
public int $timeout = 60;
public function __construct(
public readonly int $productId,
public readonly array $urls,
) {}
public function handle(ImageProcessor $processor): void
{
foreach ($this->urls as $index => $url) {
try {
$tmpPath = $processor->download($url);
$stored = $processor->processAndStore($tmpPath, $this->productId, $index);
ProductImage::updateOrCreate(
['product_id' => $this->productId, 'sort' => $index],
['path' => $stored, 'is_main' => $index === 0, 'source_url' => $url]
);
} catch (\Exception $e) {
Log::warning("Image download failed: {$url} — {$e->getMessage()}");
}
}
}
}
Downloaded File Validation
class ImageProcessor
{
private const ALLOWED_MIME = ['image/jpeg', 'image/png', 'image/webp', 'image/gif'];
private const MAX_SIZE = 20 * 1024 * 1024; // 20 MB
public function download(string $url): string
{
$response = $this->client->get($url, ['timeout' => 30, 'stream' => true]);
$tmpPath = tempnam(sys_get_temp_dir(), 'img_');
$body = $response->getBody();
$size = 0;
$fp = fopen($tmpPath, 'wb');
while (!$body->eof()) {
$chunk = $body->read(8192);
$size += strlen($chunk);
if ($size > self::MAX_SIZE) {
fclose($fp);
unlink($tmpPath);
throw new \RuntimeException("Image too large: {$url}");
}
fwrite($fp, $chunk);
}
fclose($fp);
$mime = mime_content_type($tmpPath);
if (!in_array($mime, self::ALLOWED_MIME)) {
unlink($tmpPath);
throw new \RuntimeException("Invalid MIME type: {$mime} for {$url}");
}
return $tmpPath;
}
}
Optimization and Conversion
Using intervention/image (v3) for resize and Spatie image-optimizer for compression:
public function processAndStore(string $tmpPath, int $productId, int $sort): string
{
$manager = new \Intervention\Image\ImageManager(
new \Intervention\Image\Drivers\Gd\Driver()
);
$image = $manager->read($tmpPath);
// Generate multiple sizes
$variants = [
'full' => [1200, 1200],
'catalog' => [400, 400],
'thumbnail' => [100, 100],
];
$paths = [];
foreach ($variants as $name => [$w, $h]) {
$resized = clone $image;
$resized->coverDown($w, $h);
$filename = "products/{$productId}/{$sort}_{$name}.webp";
$encoded = $resized->toWebp(quality: 85);
Storage::disk('public')->put($filename, $encoded);
$paths[$name] = $filename;
}
unlink($tmpPath);
return json_encode($paths);
}
coverDown crops image at center, preserving proportions — standard for catalog photos.
Deduplication: Don't Download Again
If supplier sent same URL — don't waste traffic and time:
$existing = ProductImage::where([
'product_id' => $productId,
'source_url' => $url,
])->first();
if ($existing && Storage::exists($existing->path)) {
continue; // already have, skip
}
For more reliable deduplication — store content hash (SHA-256 of first 4 KB): same file from different URLs won't download twice.
Download from FTP Supplier
class FtpImageSource
{
public function syncForProduct(string $sku): array
{
$ftp = ftp_connect($this->host);
ftp_login($ftp, $this->user, $this->pass);
$files = ftp_nlist($ftp, $this->baseDir);
$matched = array_filter($files, fn($f) => str_contains($f, $sku));
$urls = [];
foreach ($matched as $remotePath) {
$tmp = tempnam(sys_get_temp_dir(), 'ftpimg_');
ftp_get($ftp, $tmp, $remotePath, FTP_BINARY);
$urls[] = $tmp; // path to local file
}
ftp_close($ftp);
return $urls;
}
}
Handling 404 and Broken Links
Suppliers periodically delete or move images. Strategy:
- On 404 — log, skip, don't delete already saved image
- After 3 failed attempts — mark
source_urlasdead = true - Once per week — report on dead links offering to upload image manually
Parallelism and Queues
| Parameter | Value |
|---|---|
| Queue | images (separate from default) |
| Workers per queue | 4–8 |
| Task timeout | 60 sec |
| URL chunk size in Job | 10 pieces |
| Retry attempts | 3 |
With 10,000 images and 4 workers, total download time — about 20–40 minutes (depends on supplier hosting speed).
Implementation Timeline
- HTTP download, validation, WebP conversion, saving — 2 days
- Multiple sizes, deduplication, dead link monitoring — +1–2 days
- FTP source + parallel queue + progress dashboard — +1 day







