AI Camera AR Translation for Mobile App

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.
Development and support of all types of mobile applications:
Information and entertainment mobile applications
News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators
E-commerce mobile applications
Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.
Business process management mobile applications
CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems
Electronic services mobile applications
Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Showing 1 of 1 servicesAll 1735 services
AI Camera AR Translation for Mobile App
Complex
~1-2 weeks
FAQ
Our competencies:
Development stages
Latest works
  • image_mobile-applications_feedme_467_0.webp
    Development of a mobile application for FEEDME
    756
  • image_mobile-applications_xoomer_471_0.webp
    Development of a mobile application for XOOMER
    624
  • image_mobile-applications_rhl_428_0.webp
    Development of a mobile application for RHL
    1054
  • image_mobile-applications_zippy_411_0.webp
    Development of a mobile application for ZIPPY
    947
  • image_mobile-applications_affhome_429_0.webp
    Development of a mobile application for Affhome
    862
  • image_mobile-applications_flavors_409_0.webp
    Development of a mobile application for the FLAVORS company
    445

AI Camera Translation (AR Translation) in Mobile Apps

Google Translate's "Instant Translation" is AR Translation in action: camera sees text, translation appears in real-time overlaid on the image as if printed there natively. Implementing it independently is harder than it appears: OCR, translation, inpainting the background beneath erased source text, and rendering new text with correct font and size.

AR Translation Pipeline Architecture

Each camera frame passes through multiple stages:

Frame → Text Detection → OCR → Translation → Inpainting → Text Overlay → Render

Text Detection. Find text bounding boxes in frame. On iOS: VNRecognizeTextRequest (Vision framework) with recognitionLevel: .fast for real-time. On Android: ML Kit Text Recognition v2. Both work on-device, no network required. Vision framework returns VNTextObservation with bounding box in normalized coordinates — convert to screen coordinates accounting for buffer orientation.

OCR. VNRecognizeTextRequest with recognitionLevel: .accurate is too slow for every frame. Strategy: use .fast for detection, .accurate only when text stabilizes (user tap or stationary phone). Stable frame detection: compare bounding boxes between frames — if deviation < 5px → text is stable → run accurate OCR.

Translation. Two options:

On-device (ML Kit Translate) Cloud API (DeepL, Google Cloud)
Latency 10–50 ms 200–800 ms
Quality Adequate High (DeepL especially)
Offline Yes (~30 MB model) No
Cost Free Per request

For live camera stream — on-device only. For "photograph → translate" mode — cloud API with DeepL for better quality.

Inpainting and Text Overlay — Most Complex Part

Simple approach: draw background-colored rectangle over source text, write translation on top. Result — crude white rectangle, doesn't fit the image. Correct approach:

Background Color Detection. Sample pixels around bounding box, compute median color — fill rectangle with it. Works for uniform backgrounds (white wall, paper sheet).

Texture Inpainting for Complex Backgrounds. CoreImage CIInpaintingFilter (iOS 16+) or custom convolution kernel to fill region with background texture. For real-time — too slow, use only in static photo mode.

Font Matching. Determine source text size from bounding box, select UIFont / TextPaint with similar size. Identifying specific font from OCR result — unsolved for most cases. Use system sans-serif.

Right-to-Left (RTL) Languages. Arabic, Hebrew — text flows right-to-left, UILabel and TextView need semanticContentAttribute: .forceRightToLeft. When overlaying on image: NSParagraphStyle.writingDirection = .rightToLeft.

Stabilization and Performance

Running full pipeline every frame at 30 FPS is impossible. Throttling:

  • Text detection: every 3–5 frames
  • OCR: only on stabilization or tap
  • Translation: debounce 500 ms on text change

On iPhone 12+ Metal Performance Shaders accelerate Vision pipeline. On Android — GPU Delegate for ML Kit via GpuDelegateV2.

Cache results by OCR text hash: don't translate same text twice in session.

What's Included

  • Architecture selection: on-device vs cloud, livecam vs photo mode
  • OCR + translation pipeline implementation
  • UI for language selection (with source language auto-detection)
  • Translation text overlay on image
  • Offline mode with downloadable language models (ML Kit)

Timeline: basic AR translation for static photos — 3–5 weeks. Real-time livecam translation with on-device ML and offline mode — 6–10 weeks. Cost calculated individually.