Mobile Application Development for Document Scanning
Document scanning app — not just camera wrapper. User expects result indistinguishable from tablet scanner: straight edges, readable text, correct orientation, acceptable PDF size. When this fails — app deleted after first passport scan with finger shadow.
Document Detection and Correction
Most important — find document boundaries in frame and remove perspective distortion. On iOS chain: AVCaptureSession → VNDetectRectanglesRequest → CIPerspectiveCorrection.
VNDetectRectanglesRequest tuned for task:
let request = VNDetectRectanglesRequest { request, _ in
guard let results = request.results as? [VNRectangleObservation],
let rect = results.first else { return }
DispatchQueue.main.async {
self.highlightDetectedDocument(rect)
}
}
request.minimumAspectRatio = 0.5
request.maximumAspectRatio = 1.0
request.minimumSize = 0.3 // minimum 30% frame area
request.quadratureTolerance = 20 // tolerance for non-straight corners
After capture — CIPerspectiveCorrection with four corner points. Critical: give user manual corner adjustment when auto-detect misses. Without this mode app useless on crumpled or partially covered documents.
On Android — similar path via ML Kit Document Scanner API (appeared 2023) or custom implementation via OpenCV findContours → approxPolyDP. ML Kit Document Scanner easier to integrate but requires Google Play Services — not option for GMS-less devices.
Image Post-processing
Detection without enhancement gives mediocre result. After perspective correction apply:
-
Adaptive thresholding — for black-and-white mode (passports, contracts).
CIColorMonochrome+ custom kernel orOpenCV adaptiveThreshold. NotCIPhotoEffectNoir— gives uneven results on documents with pale text. -
Noise reduction —
CINoiseReductionwith0.02noise level and0.4sharpness parameters for typical office documents. - Background leveling — uneven lighting (hand shadow, window in frame) removed via top-hat transform in OpenCV.
Color mode — for documents with stamps and signatures. Important not to over-compress contrast — color stamps become muddy with aggressive post-processing.
PDF Assembly and OCR
Multi-page PDF assembled via PDFKit on iOS — straightforward:
let pdfDocument = PDFDocument()
for (index, image) in scannedPages.enumerated() {
let pdfPage = PDFPage(image: image)!
pdfDocument.insert(pdfPage, at: index)
}
pdfDocument.write(to: outputURL)
On Android — iText7 or PdfDocument from Android SDK.
OCR to document (searchable PDF) — separate feature. VNRecognizeTextRequest on iOS with revision3 gives acceptable quality for Russian and English. Results embedded as hidden text layer via PDFKit. For serious tasks (archives, legal documents) — Google Cloud Document AI or Tesseract with LSTM engine.
Storage and Synchronization
Documents take space. A4 PDF pages at 200 DPI — about 200–400 KB after JPEG compression at quality 85. Without compression — 2–5 MB. Storage strategy: locally in Application Support (not Documents — iCloud otherwise auto-captures all scans), sync via iCloud Drive or custom backend on user demand.
For Google Drive / Dropbox integration use official SDKs not REST directly — they handle token refresh and partial upload on connection break.
Implementation Process
Important to define immediately: need OCR, what export formats (PDF, JPEG, DOCX), require cloud storage, legal validity requirements for scans.
Development: camera with real-time detection → manual adjustment → post-processing → OCR (optional) → PDF assembly → export.
Timeline Estimates
Basic scanner with auto-detect and PDF — 3–4 weeks. With OCR, searchable PDF, cloud sync and multi-page mode — 6–9 weeks.







