OCR text recognition via camera in mobile app

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Development and support of all types of mobile applications:

Information and entertainment mobile applications

News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators

E-commerce mobile applications

Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.

Business process management mobile applications

CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems

Electronic services mobile applications

Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Offered services

Showing 1 of 1 servicesAll 1735 services

OCR text recognition via camera in mobile app

Medium

~2-3 business days

FAQ

Our competencies:

Free consultation

Book a free consultation if you have any questions. A dedicated specialist will advise you.

Cost calculation

If you know what exactly you need to develop, or you already have a ready-made technical task.

Development stages

Latest works

Development of a mobile application for FEEDME
761
Development of a mobile application for XOOMER
649
Development of a mobile application for RHL
1071
Development of a mobile application for ZIPPY
947
Development of a mobile application for Affhome
884
Development of a mobile application for the FLAVORS company
466

Show more works

Implementing OCR (Text Recognition) via Camera in Mobile Applications

User points camera at a price tag, receipt, contract or sign — and the app instantly recognizes text. The gap between "works in demo" and "works in production" is enormous here: real conditions include poor lighting, tilted text, handwritten elements and different languages in one frame.

Native OCR Frameworks Without External Dependencies

iOS: Vision + VNRecognizeTextRequest

Since iOS 13, the Vision framework can recognize text offline. VNRecognizeTextRequest supports two modes: .fast (approximate, instant) and .accurate (slower but significantly more accurate for complex fonts).

func recognizeText(in image: UIImage) {
    guard let cgImage = image.cgImage else { return }

    let request = VNRecognizeTextRequest { [weak self] request, error in
        guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
        let text = observations.compactMap { $0.topCandidates(1).first?.string }.joined(separator: "\n")
        DispatchQueue.main.async { self?.handleRecognized(text: text) }
    }

    request.recognitionLevel = .accurate
    request.usesLanguageCorrection = true
    request.recognitionLanguages = ["ru-RU", "en-US"] // order = priority

    let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
    try? handler.perform([request])
}

usesLanguageCorrection helps with typos, but sometimes "corrects" abbreviations and article codes — for technical documents, better to disable.

Android: ML Kit Text Recognition v2

com.google.mlkit:text-recognition supports Latin, Cyrillic, Chinese, Japanese, Korean via separate modules. Downloads on first use (~5 MB for Latin).

val recognizer = TextRecognition.getClient(
    TextRecognizerOptions.DEFAULT_OPTIONS // or RussianTextRecognizerOptions
)

val image = InputImage.fromBitmap(bitmap, 0)
recognizer.process(image)
    .addOnSuccessListener { visionText ->
        val fullText = visionText.textBlocks
            .joinToString("\n") { block -> block.text }
        handleRecognized(fullText)
    }
    .addOnFailureListener { e -> handleError(e) }

ML Kit also returns bounding boxes for each text block — useful for highlighting recognized areas in UI.

Live Mode: Text in Real-Time From Video Stream

For live overlay (text highlighted directly in video stream), on iOS use AVCaptureSession + CMSampleBuffer:

// AVCaptureVideoDataOutput delegate method
func captureOutput(_ output: AVCaptureOutput,
                   didOutput sampleBuffer: CMSampleBuffer,
                   from connection: AVCaptureConnection) {
    guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }

    // Don't start new request if previous hasn't completed
    guard !isProcessing else { return }
    isProcessing = true

    let request = VNRecognizeTextRequest { [weak self] request, _ in
        defer { self?.isProcessing = false }
        // process results...
    }
    request.recognitionLevel = .fast // for live, speed matters

    try? VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:]).perform([request])
}

The isProcessing flag is mandatory — without it at 30 FPS, request queue accumulates and memory grows to crash.

On Android — CameraX + ImageAnalysis.Analyzer. ML Kit is optimized for working with ImageProxy directly without Bitmap conversion.

Post-processing: From Raw Text to Structured Data

Raw OCR result is a stream of lines. For most tasks, structuring is needed:

Receipts: extract lines with prices via regex, parse total amount
Business cards: NSDataDetector (iOS) or Patterns (Android) for phones, email, addresses
Passports/documents: MRZ zone read by ICAO 9303 standard, ready parsers exist
License plates: separate task — better specialized model (OpenALPR, PlateRecognizer API)

For Cyrillic text with poor quality, sometimes image preprocessing helps: contrast boost via vImageContrastStretch, grayscale conversion, Sharpen CIFilter before passing to OCR.

Workflow

Define use cases: document types, languages, whether live mode or static photo only is needed.

Implement image capture (camera + gallery), preprocessing.

Integrate OCR: native Vision/ML Kit or cloud (Google Vision API, AWS Textract) if higher accuracy for complex documents is needed.

Post-process for specific task: data structuring, regex, NER.

Test on real samples in different lighting conditions.

Timeline Guidelines

Basic static text recognition via native framework — 2–3 days. Live mode with overlay + data structuring for specific document type — 1–2 weeks.