Implementing Document Conversion (PDF → Image, DOCX → PDF) on Server
Two most common server conversion scenarios: convert PDF pages to images (for previews, thumbnails) and convert DOCX to PDF (for unified format downloads, printing, archiving).
PDF → Image
Standard tool—Ghostscript (gs). Present in most Linux distributions, handles vector graphics and fonts well:
apt install ghostscript
Convert first page to JPEG:
gs -dNOPAUSE -dBATCH -sDEVICE=jpeg -r150 \
-dFirstPage=1 -dLastPage=1 \
-sOutputFile=page_%02d.jpg \
input.pdf
-r150—DPI (150 sufficient for preview, 300 for full quality). -dFirstPage/-dLastPage—page range.
Alternative—ImageMagick via Ghostscript under the hood:
convert -density 150 input.pdf[0] -quality 85 page_0.jpg
[0]—page index (0 = first).
Important: in newer ImageMagick versions (7+), PDF processing via Ghostscript is disabled by default for security. Allow in /etc/ImageMagick-7/policy.xml:
<!-- was: rights="none" -->
<policy domain="coder" rights="read|write" pattern="PDF" />
DOCX → PDF
LibreOffice in headless mode—most reliable free tool:
apt install libreoffice
libreoffice --headless --convert-to pdf --outdir /output /input/document.docx
Conversion takes 2–8 seconds per typical document. LibreOffice creates user profile on first run—parallel calls from different processes conflict. Solution: unique profile per call via --env:UserInstallation:
libreoffice --headless \
"--env:UserInstallation=file:///tmp/lo_profile_$$" \
--convert-to pdf \
--outdir /tmp/output \
/tmp/input/document.docx
$$—process PID, ensures uniqueness.
PHP Conversion Service
namespace App\Services;
use Illuminate\Support\Str;
class DocumentConversionService
{
public function pdfToImages(
string $pdfPath,
string $outputDir,
int $dpi = 150,
string $format = 'jpg',
?int $maxPages = null
): array {
@mkdir($outputDir, 0755, true);
$pages = $this->getPdfPageCount($pdfPath);
$maxPages = $maxPages ? min($maxPages, $pages) : $pages;
$outputPattern = escapeshellarg("{$outputDir}/page_%03d.{$format}");
$cmd = implode(' ', [
'gs',
'-dNOPAUSE -dBATCH -dSAFER',
'-sDEVICE=' . ($format === 'png' ? 'png16m' : 'jpeg'),
"-r{$dpi}",
"-dFirstPage=1 -dLastPage={$maxPages}",
'-dJPEGQ=85',
"-sOutputFile={$outputPattern}",
escapeshellarg($pdfPath),
'2>&1',
]);
exec($cmd, $output, $exitCode);
if ($exitCode !== 0) {
throw new \RuntimeException(
"PDF conversion failed: " . implode("\n", $output)
);
}
// Collect created files
$files = [];
for ($i = 1; $i <= $maxPages; $i++) {
$file = sprintf("{$outputDir}/page_%03d.{$format}", $i);
if (file_exists($file)) {
$files[] = $file;
}
}
return $files;
}
public function docxToPdf(string $docxPath, string $outputDir): string
{
@mkdir($outputDir, 0755, true);
$profileDir = sys_get_temp_dir() . '/lo_profile_' . Str::random(8);
@mkdir($profileDir, 0755, true);
$cmd = implode(' ', [
'libreoffice',
'--headless',
"--env:UserInstallation=" . escapeshellarg("file://{$profileDir}"),
'--convert-to pdf',
'--outdir ' . escapeshellarg($outputDir),
escapeshellarg($docxPath),
'2>&1',
]);
exec($cmd, $output, $exitCode);
// Delete temporary profile
$this->rmdir($profileDir);
if ($exitCode !== 0) {
throw new \RuntimeException(
"DOCX to PDF failed: " . implode("\n", $output)
);
}
$outputFile = $outputDir . '/' . pathinfo($docxPath, PATHINFO_FILENAME) . '.pdf';
if (!file_exists($outputFile)) {
throw new \RuntimeException("Output PDF not found after conversion");
}
return $outputFile;
}
public function getPdfPageCount(string $path): int
{
$cmd = "pdfinfo " . escapeshellarg($path) . " 2>/dev/null | grep 'Pages:' | awk '{print $2}'";
$output = shell_exec($cmd);
return max(1, (int) trim($output ?? '1'));
}
private function rmdir(string $dir): void
{
if (!is_dir($dir)) return;
$items = new \RecursiveIteratorIterator(
new \RecursiveDirectoryIterator($dir, \FilesystemIterator::SKIP_DOTS),
\RecursiveIteratorIterator::CHILD_FIRST
);
foreach ($items as $item) {
$item->isDir() ? rmdir($item->getRealPath()) : unlink($item->getRealPath());
}
rmdir($dir);
}
}
Jobs for Background Processing
// app/Jobs/ConvertDocumentJob.php
class ConvertDocumentJob implements ShouldQueue
{
public int $timeout = 300;
public int $tries = 2;
public function __construct(
private int $documentId,
private string $type // 'pdf_to_images' | 'docx_to_pdf'
) {}
public function handle(DocumentConversionService $converter): void
{
$doc = Document::findOrFail($this->documentId);
$src = Storage::disk('documents')->path($doc->path);
$outDir = Storage::disk('documents')->path("converted/{$doc->id}");
match ($this->type) {
'pdf_to_images' => $this->convertPdfToImages($converter, $doc, $src, $outDir),
'docx_to_pdf' => $this->convertDocxToPdf($converter, $doc, $src, $outDir),
};
}
private function convertPdfToImages(
DocumentConversionService $converter,
Document $doc,
string $src,
string $outDir
): void {
$files = $converter->pdfToImages($src, $outDir, dpi: 150, maxPages: 20);
$pages = array_map(
fn($f) => "converted/{$doc->id}/" . basename($f),
$files
);
$doc->update([
'preview_pages' => $pages,
'page_count' => count($pages),
'status' => 'ready',
]);
}
private function convertDocxToPdf(
DocumentConversionService $converter,
Document $doc,
string $src,
string $outDir
): void {
$pdfPath = $converter->docxToPdf($src, $outDir);
$relPath = "converted/{$doc->id}/" . basename($pdfPath);
$doc->update([
'pdf_path' => $relPath,
'status' => 'ready',
]);
}
}
Security When Processing Documents
User files—potential attack vector. Mandatory measures:
- Check MIME type via
finfo_file(), not just extension. - Run conversion in separate user without write rights to project root.
- Limit file size (
max: 50000in validation rules—50 MB). - Don't store uploaded files in public path until check completes.
$request->validate([
'file' => [
'required',
'file',
'max:51200', // 50 MB
'mimes:pdf,docx,doc',
],
]);
// Additional real MIME check
$finfo = new \finfo(FILEINFO_MIME_TYPE);
$mime = $finfo->file($request->file('file')->getRealPath());
abort_unless(in_array($mime, ['application/pdf', 'application/vnd.openxmlformats-officedocument.wordprocessingml.document']), 422);
Timeline
Configure Ghostscript, LibreOffice, conversion service, Jobs—6–8 hours. Add PDF page previews to UI, converted PDF download endpoint—another 3–4 hours.







