Implementing Product Import from Supplier Files (CSV/Excel/XML/JSON)
Suppliers send price lists in whatever format suits them: someone in Excel, someone in XML, someone in CSV with non-standard delimiter. Task is to build an import system that works with any format through unified interface without separate code per supplier.
Unified Parser Interface
interface FileParserInterface
{
/** @return iterable<array<string, mixed>> */
public function parse(string $filePath): iterable;
public function supports(string $mimeType, string $extension): bool;
}
Factory selects appropriate parser by extension or MIME:
class FileParserFactory
{
private array $parsers;
public function make(string $filePath): FileParserInterface
{
$ext = strtolower(pathinfo($filePath, PATHINFO_EXTENSION));
$mime = mime_content_type($filePath);
foreach ($this->parsers as $parser) {
if ($parser->supports($mime, $ext)) return $parser;
}
throw new \RuntimeException("No parser for: {$ext} / {$mime}");
}
}
CSV Parser
class CsvParser implements FileParserInterface
{
public function __construct(
private string $delimiter = ',',
private string $enclosure = '"',
private bool $hasHeader = true,
) {}
public function parse(string $filePath): iterable
{
$handle = fopen($filePath, 'r');
$headers = $this->hasHeader ? fgetcsv($handle, 0, $this->delimiter, $this->enclosure) : null;
while ($row = fgetcsv($handle, 0, $this->delimiter, $this->enclosure)) {
if (!array_filter($row)) continue; // empty line
yield $headers
? array_combine($headers, $row)
: $row;
}
fclose($handle);
}
public function supports(string $mimeType, string $extension): bool
{
return in_array($extension, ['csv', 'txt'])
|| str_contains($mimeType, 'csv');
}
}
Delimiter and encoding configured via source config. For Windows-1251 — wrapper via mb_convert_encoding line by line.
Excel Parser via PhpSpreadsheet
class ExcelParser implements FileParserInterface
{
public function parse(string $filePath): iterable
{
$spreadsheet = \PhpOffice\PhpSpreadsheet\IOFactory::load($filePath);
$sheet = $spreadsheet->getActiveSheet();
$rows = $sheet->toArray(null, true, true, false);
$headers = array_shift($rows);
foreach ($rows as $row) {
if (!array_filter($row)) continue;
yield array_combine($headers, $row);
}
}
public function supports(string $mimeType, string $extension): bool
{
return in_array($extension, ['xls', 'xlsx', 'ods']);
}
}
For large Excel files (>100 MB) use setReadDataOnly(true) and setLoadSheetsOnly(['Sheet1']) — reduces memory by 3–5x.
XML Parser (Streaming)
class XmlParser implements FileParserInterface
{
public function __construct(
private string $itemTag = 'product',
) {}
public function parse(string $filePath): iterable
{
$reader = new \XMLReader();
$reader->open($filePath);
while ($reader->read()) {
if ($reader->nodeType === \XMLReader::ELEMENT
&& $reader->name === $this->itemTag) {
$node = new \SimpleXMLElement($reader->readOuterXml());
yield $this->nodeToArray($node);
}
}
$reader->close();
}
private function nodeToArray(\SimpleXMLElement $node): array
{
$result = [];
foreach ($node->children() as $child) {
$key = $child->getName();
$result[$key] = $child->count() > 0
? $this->nodeToArray($child)
: (string) $child;
}
foreach ($node->attributes() as $k => $v) {
$result['@' . $k] = (string) $v;
}
return $result;
}
public function supports(string $mimeType, string $extension): bool
{
return $extension === 'xml' || str_contains($mimeType, 'xml');
}
}
XMLReader reads file streaming — doesn't load entire document into memory.
JSON Parser
class JsonParser implements FileParserInterface
{
public function __construct(
private string $itemsPath = 'products', // dot-notation: "data.items"
) {}
public function parse(string $filePath): iterable
{
$content = file_get_contents($filePath);
$data = json_decode($content, true, 512, JSON_THROW_ON_ERROR);
$items = data_get($data, $this->itemsPath) ?? $data;
foreach ($items as $item) {
yield $item;
}
}
public function supports(string $mimeType, string $extension): bool
{
return $extension === 'json' || str_contains($mimeType, 'json');
}
}
For large JSON files — use halaxa/json-machine, reads JSON streaming without loading entire file.
Column Mapping
Each supplier uses own column names. Config stored in database:
{
"sku": "Article",
"name": "Product Name",
"price": "Price rub.",
"qty": "Quantity",
"description": "Description",
"category": "Section"
}
Transformer applies mapping before passing to importer:
class ColumnMapper
{
public function transform(array $row, array $mapping): array
{
$result = [];
foreach ($mapping as $internalKey => $sourceKey) {
$result[$internalKey] = $row[$sourceKey] ?? null;
}
return $result;
}
}
Import Pipeline
FileParserFactory::make($file)
└─> CsvParser / ExcelParser / XmlParser / JsonParser
└─> iterate rows
└─> ColumnMapper::transform($row, $config->mapping)
└─> ProductValidator::validate($mapped) // skip invalid
└─> ProductUpsertJob::dispatch($mapped) // queue
Encoding and BOM Handling
private function detectAndConvert(string $content): string
{
// UTF-8 BOM
if (str_starts_with($content, "\xEF\xBB\xBF")) {
$content = substr($content, 3);
}
$encoding = mb_detect_encoding($content, ['UTF-8', 'Windows-1251', 'ISO-8859-1'], true);
if ($encoding && $encoding !== 'UTF-8') {
$content = mb_convert_encoding($content, 'UTF-8', $encoding);
}
return $content;
}
Implementation Timeline
- CSV + Excel parsers, column mapping, basic pipeline — 2 days
- XML (streaming) + JSON + auto-format detection — +1 day
- Mapping config in UI + encoding handling + error handling — +1 day







