Document scanning via camera in mobile app

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.
Development and support of all types of mobile applications:
Information and entertainment mobile applications
News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators
E-commerce mobile applications
Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.
Business process management mobile applications
CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems
Electronic services mobile applications
Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Showing 1 of 1 servicesAll 1735 services
Document scanning via camera in mobile app
Medium
~3-5 business days
FAQ
Our competencies:
Development stages
Latest works
  • image_mobile-applications_feedme_467_0.webp
    Development of a mobile application for FEEDME
    756
  • image_mobile-applications_xoomer_471_0.webp
    Development of a mobile application for XOOMER
    624
  • image_mobile-applications_rhl_428_0.webp
    Development of a mobile application for RHL
    1052
  • image_mobile-applications_zippy_411_0.webp
    Development of a mobile application for ZIPPY
    947
  • image_mobile-applications_affhome_429_0.webp
    Development of a mobile application for Affhome
    862
  • image_mobile-applications_flavors_409_0.webp
    Development of a mobile application for the FLAVORS company
    445

Implementing Document Scanning via Camera in Mobile Applications

Scanning a document with a smartphone camera — a task that looks simple but contains a dozen technical nuances. Perspective distortion, shadows from fingers, reflective surfaces, hand tremor during capture — all must be processed before results reach the user.

Detecting Document Boundaries

First step — find four corners of the document in the frame. On iOS with iOS 13+ this is done by VisionKit via VNDetectRectanglesRequest:

let request = VNDetectRectanglesRequest { request, error in
    guard let results = request.results as? [VNRectangleObservation],
          let rect = results.first else { return }

    // rect.topLeft, topRight, bottomLeft, bottomRight in normalized coordinates [0,1]
    DispatchQueue.main.async {
        self.overlayView.drawQuadrilateral(observation: rect,
                                           imageSize: self.previewLayer.frame.size)
    }
}
request.minimumConfidence = 0.8
request.minimumAspectRatio = 0.5 // filter out narrow rectangles
request.quadratureTolerance = 30  // tolerance for deviation from rectangle in degrees

Since iOS 16, VNDocumentCameraViewController is available — ready interface from Apple with automatic capture, perspective correction and multi-page scanning. For most tasks, this is the optimal choice.

On Android — ML Kit Document Scanner API (beta, available via Google Play Services) or OpenCV via NDK for custom solutions.

Perspective Correction

After detecting four corners, apply homographic transformation — align tilted document to a rectangle "as if shot from above". On iOS this is CIPerspectiveCorrection from Core Image:

func correctPerspective(image: CIImage, observation: VNRectangleObservation) -> CIImage {
    let imageSize = image.extent.size

    // Convert normalized Vision coordinates to pixel CIImage
    func toPixel(_ point: CGPoint) -> CIVector {
        return CIVector(x: point.x * imageSize.width,
                        y: point.y * imageSize.height)
    }

    let filter = CIFilter.perspectiveCorrection()
    filter.inputImage = image
    filter.topLeft     = toPixel(observation.topLeft)
    filter.topRight    = toPixel(observation.topRight)
    filter.bottomLeft  = toPixel(observation.bottomLeft)
    filter.bottomRight = toPixel(observation.bottomRight)

    return filter.outputImage ?? image
}

Important: CIImage coordinate system is inverted by Y relative to UIKit — topLeft in Vision is bottomLeft in CIImage. This mistake occurs in 90% of first implementations.

Image Post-processing

Scanned document after geometric correction usually needs enhancement:

Grayscale + contrast boost — for text recognition, archive documents:

let grayscaleFilter = CIFilter.colorControls()
grayscaleFilter.saturation = 0
grayscaleFilter.contrast = 1.3

Adaptive thresholding — "black and white" effect like in Adobe Scan. Core Image lacks built-in adaptive threshold, so use CIKernel or Metal Compute Shader for 15×15 pixel block processing.

Document enhancement — iOS 17+ has VNGeneratePersonInstanceMaskRequest that helps remove hand shadow. For earlier versions — GPUImage3 or custom Metal shader for highlight recovery.

Multi-page Scanning and PDF

User scans multiple pages — they combine into one document. On iOS — PDFDocument + PDFPage from PDFKit:

func createPDF(from images: [UIImage]) -> Data? {
    let pdfDocument = PDFDocument()
    for (index, image) in images.enumerated() {
        guard let page = PDFPage(image: image) else { continue }
        pdfDocument.insert(page, at: index)
    }
    return pdfDocument.dataRepresentation()
}

PDF size matters: A4 at 300 DPI = ~2500×3500 px. For storage and transfer, compress JPEG inside PDF with quality 0.7–0.85. For OCR tasks — save original resolution.

Automatic vs Manual Capture

Auto trigger on document detection — good UX but requires stabilization: document in frame > 1.5 seconds with sufficient confidence before auto-capture. Too aggressive trigger annoys — user still aligning phone, app already shot.

Workflow

Define scenario: document types (passport, receipt, contract, multi-page materials), whether PDF export is needed, OCR integration.

Implement boundary detector and live preview with found document highlighting.

Perspective correction, image post-processing.

Multi-page mode, export to PDF or JPEG.

Test in real conditions: different lighting, various paper types (glossy, matte, old documents).

Timeline Guidelines

Basic scanner with VNDocumentCameraViewController (iOS only) — 1–2 days. Custom implementation with manual perspective correction, post-processing and multi-page PDF — 3–5 days per platform.