Implementing Document Scanning via Mobile App Camera
Users hold their phone over a contract, the app automatically detects the edges, corrects perspective, and delivers a clean PDF. This isn't just "take a photo and crop"—it involves edge detection, homographic transformation, and post-processing. Each step has failure points.
Common Breaking Points
Edge Detection Fails on Glare and Shadows
VNDetectRectanglesRequest (iOS Vision) returns VNRectangleObservation with four corner points in normalized coordinates. The problem: on glossy paper under direct light, the algorithm confuses glare with the paper edge. Solution—before detection, apply CIFilter with CIColorControls (reduce inputSaturation) and CIHighlightShadowAdjust. This removes glare as a color artifact.
On Android, Vision API (com.google.android.gms:play-services-mlkit-document-scanner) handles shadows better but requires Google Play Services. Alternative without GMS dependency—OpenCV findContours + approxPolyDP with filtering by area and aspect ratio. Threshold minArea = 30% of frame area removes background objects.
Perspective Correction
After obtaining four points, apply perspective transform. iOS: CIPerspectiveCorrection with explicit inputTopLeft, inputTopRight, inputBottomLeft, inputBottomRight in image coordinates (not preview coordinates). Common mistake—using preview layer coordinates directly without conversion via VNImagePointForNormalizedPoint.
Android: getPerspectiveTransform + warpPerspective from OpenCV, or matrix transformation via android.graphics.Matrix.setPolyToPoly. The second option works without OpenCV but is limited to affine transformations—not suitable for heavy perspective distortion.
Flutter: use cunning_document_scanner package (wrapper over native SDKs) or implement manually via image + manual homography calculation in Dart. The latter is tedious; native channel is preferable.
Post-Processing: Readability Over Aesthetics
After straightening, process the document for readability when printing or OCR:
-
Adaptive binarization—
cv::adaptiveThresholdwith Gaussian method outperforms Otsu on documents with uneven lighting - Deskew—if the document is rotated 1–2° after transformation, Hough Lines detect text line inclination and correct it
-
Sharpness—
CISharpenLuminance(iOS) orSharpnessfilter (Android) with moderate value (0.4–0.6), not excessive
Offer color modes to the user: "Auto", "Document" (black and white), "Photo" (full color). In "Document" mode—apply binarization. In "Auto"—analyze histogram: if the document contains <5% saturated pixels, apply monochrome processing.
Multi-Page Scanning and PDF
Collect UIImage[] / Bitmap[], export via PDFKit (iOS 11+) or android.graphics.pdf.PdfDocument. On Flutter—use pdf package (pub.dev). Optimize PDF size: JPEG compression at 85% is sufficient for readability, with an A4 page taking ~150–250 KB instead of 2–4 MB for PNG.
Real-time preview: display the outline over AVCaptureVideoPreviewLayer / PreviewView via CAShapeLayer / SurfaceView. Update the outline every 3–5 frames (not every frame!)—otherwise the detector consumes CPU and preview stutters.
Our Workflow
Requirements audit → SDK selection (native Vision/ML Kit vs OpenCV) → preview integration with overlay → correction + post-processing → PDF export → testing on 10+ document types (passport, contract, receipt, book spread).
Timeline: 3–5 working days depending on platform and quality requirements. If OCR recognition integration is needed—add 2–3 days.







