OCR text recognition via camera in mobile app

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.
Development and support of all types of mobile applications:
Information and entertainment mobile applications
News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators
E-commerce mobile applications
Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.
Business process management mobile applications
CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems
Electronic services mobile applications
Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Showing 1 of 1 servicesAll 1735 services
OCR text recognition via camera in mobile app
Medium
~2-3 business days
FAQ
Our competencies:
Development stages
Latest works
  • image_mobile-applications_feedme_467_0.webp
    Development of a mobile application for FEEDME
    756
  • image_mobile-applications_xoomer_471_0.webp
    Development of a mobile application for XOOMER
    624
  • image_mobile-applications_rhl_428_0.webp
    Development of a mobile application for RHL
    1052
  • image_mobile-applications_zippy_411_0.webp
    Development of a mobile application for ZIPPY
    947
  • image_mobile-applications_affhome_429_0.webp
    Development of a mobile application for Affhome
    862
  • image_mobile-applications_flavors_409_0.webp
    Development of a mobile application for the FLAVORS company
    445

Implementing OCR (Text Recognition) via Camera in Mobile Applications

User points camera at a price tag, receipt, contract or sign — and the app instantly recognizes text. The gap between "works in demo" and "works in production" is enormous here: real conditions include poor lighting, tilted text, handwritten elements and different languages in one frame.

Native OCR Frameworks Without External Dependencies

iOS: Vision + VNRecognizeTextRequest

Since iOS 13, the Vision framework can recognize text offline. VNRecognizeTextRequest supports two modes: .fast (approximate, instant) and .accurate (slower but significantly more accurate for complex fonts).

func recognizeText(in image: UIImage) {
    guard let cgImage = image.cgImage else { return }

    let request = VNRecognizeTextRequest { [weak self] request, error in
        guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
        let text = observations.compactMap { $0.topCandidates(1).first?.string }.joined(separator: "\n")
        DispatchQueue.main.async { self?.handleRecognized(text: text) }
    }

    request.recognitionLevel = .accurate
    request.usesLanguageCorrection = true
    request.recognitionLanguages = ["ru-RU", "en-US"] // order = priority

    let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
    try? handler.perform([request])
}

usesLanguageCorrection helps with typos, but sometimes "corrects" abbreviations and article codes — for technical documents, better to disable.

Android: ML Kit Text Recognition v2

com.google.mlkit:text-recognition supports Latin, Cyrillic, Chinese, Japanese, Korean via separate modules. Downloads on first use (~5 MB for Latin).

val recognizer = TextRecognition.getClient(
    TextRecognizerOptions.DEFAULT_OPTIONS // or RussianTextRecognizerOptions
)

val image = InputImage.fromBitmap(bitmap, 0)
recognizer.process(image)
    .addOnSuccessListener { visionText ->
        val fullText = visionText.textBlocks
            .joinToString("\n") { block -> block.text }
        handleRecognized(fullText)
    }
    .addOnFailureListener { e -> handleError(e) }

ML Kit also returns bounding boxes for each text block — useful for highlighting recognized areas in UI.

Live Mode: Text in Real-Time From Video Stream

For live overlay (text highlighted directly in video stream), on iOS use AVCaptureSession + CMSampleBuffer:

// AVCaptureVideoDataOutput delegate method
func captureOutput(_ output: AVCaptureOutput,
                   didOutput sampleBuffer: CMSampleBuffer,
                   from connection: AVCaptureConnection) {
    guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }

    // Don't start new request if previous hasn't completed
    guard !isProcessing else { return }
    isProcessing = true

    let request = VNRecognizeTextRequest { [weak self] request, _ in
        defer { self?.isProcessing = false }
        // process results...
    }
    request.recognitionLevel = .fast // for live, speed matters

    try? VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:]).perform([request])
}

The isProcessing flag is mandatory — without it at 30 FPS, request queue accumulates and memory grows to crash.

On Android — CameraX + ImageAnalysis.Analyzer. ML Kit is optimized for working with ImageProxy directly without Bitmap conversion.

Post-processing: From Raw Text to Structured Data

Raw OCR result is a stream of lines. For most tasks, structuring is needed:

  • Receipts: extract lines with prices via regex, parse total amount
  • Business cards: NSDataDetector (iOS) or Patterns (Android) for phones, email, addresses
  • Passports/documents: MRZ zone read by ICAO 9303 standard, ready parsers exist
  • License plates: separate task — better specialized model (OpenALPR, PlateRecognizer API)

For Cyrillic text with poor quality, sometimes image preprocessing helps: contrast boost via vImageContrastStretch, grayscale conversion, Sharpen CIFilter before passing to OCR.

Workflow

Define use cases: document types, languages, whether live mode or static photo only is needed.

Implement image capture (camera + gallery), preprocessing.

Integrate OCR: native Vision/ML Kit or cloud (Google Vision API, AWS Textract) if higher accuracy for complex documents is needed.

Post-process for specific task: data structuring, regex, NER.

Test on real samples in different lighting conditions.

Timeline Guidelines

Basic static text recognition via native framework — 2–3 days. Live mode with overlay + data structuring for specific document type — 1–2 weeks.