Image Recognition Bot in Mobile App

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.
Development and support of all types of mobile applications:
Information and entertainment mobile applications
News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators
E-commerce mobile applications
Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.
Business process management mobile applications
CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems
Electronic services mobile applications
Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Showing 1 of 1 servicesAll 1735 services
Image Recognition Bot in Mobile App
Medium
~3-5 business days
FAQ
Our competencies:
Development stages
Latest works
  • image_mobile-applications_feedme_467_0.webp
    Development of a mobile application for FEEDME
    756
  • image_mobile-applications_xoomer_471_0.webp
    Development of a mobile application for XOOMER
    624
  • image_mobile-applications_rhl_428_0.webp
    Development of a mobile application for RHL
    1052
  • image_mobile-applications_zippy_411_0.webp
    Development of a mobile application for ZIPPY
    947
  • image_mobile-applications_affhome_429_0.webp
    Development of a mobile application for Affhome
    862
  • image_mobile-applications_flavors_409_0.webp
    Development of a mobile application for the FLAVORS company
    445

Image Recognition Bot Implementation in Mobile Applications

User takes a photo — bot responds. Sounds simple, but between "attach photo" and "get useful answer" lies model choice, request size management, and handling cases where image doesn't contain expected content.

Vision API: What to Use

GPT-4o Vision (OpenAI). Send image base64 or URL in request, receive text response. Understands complex scenes, documents, handwriting, diagrams. Cost — depends on image size (tile-based pricing). For detailed high-resolution analysis — more expensive.

Claude 3.5 Sonnet / Haiku. Similar capability via Anthropic Messages API. Claude works well with documents and tables, shows comparable results with GPT-4o on most tasks.

Google Cloud Vision API. Specialized functions: OCR (TEXT_DETECTION), object recognition (OBJECT_LOCALIZATION), face (FACE_DETECTION), logos (LOGO_DETECTION), content safety (SAFE_SEARCH_DETECTION). Cheaper than LLM for homogeneous tasks, but no free-form text response.

ML Kit (Google) on-device. Completely on device: text recognition, barcodes, faces, objects. No network latency, no per-request cost. Accuracy lower than cloud LLM for complex scenes, but for structured tasks (QR code, barcode, document text) — sufficient.

CoreML + Vision (iOS). MobileNetV3, EfficientNet — on-device image classification. VNRecognizeTextRequest for OCR. VNDetectBarcodeRequest for QR/barcodes.

Choice depends on task:

Task Recommended Solution
Free-form question about photo GPT-4o Vision / Claude
Document OCR Google Vision API / ML Kit
Barcodes and QR codes ML Kit / CoreML (on-device)
Product classification Custom CoreML / TFLite model
Content moderation Google Vision SAFE_SEARCH

Sending Images from Mobile Application

Images are not sent directly to Vision API from mobile client — API key cannot be stored in app.

Data flow:

Mobile Client → Resize/Compress → Upload to S3/GCS → URL → Your Server → Vision API

Image is compressed on device to appropriate size before upload. GPT-4o with detail: "auto" determines needed resolution itself, but sending 12-megapixel photo without compression — wasteful and expensive.

// Android: compress image before upload
fun compressForBot(uri: Uri, maxSizePx: Int = 1024): ByteArray {
    val bitmap = MediaStore.Images.Media.getBitmap(contentResolver, uri)
    val scale = maxSizePx.toFloat() / maxOf(bitmap.width, bitmap.height)
    val scaled = if (scale < 1f) {
        Bitmap.createScaledBitmap(
            bitmap,
            (bitmap.width * scale).toInt(),
            (bitmap.height * scale).toInt(),
            true
        )
    } else bitmap
    val output = ByteArrayOutputStream()
    scaled.compress(Bitmap.CompressFormat.JPEG, 85, output)
    return output.toByteArray()
}

Use Case Scenarios

Retail bots. User photographs product — bot finds it in catalog, shows price and availability. Visual embedding search (CLIP + Qdrant) more accurate than text from OCR.

Medical bots. Photo of symptom, prescription, lab result — bot explains (doesn't diagnose). System prompt should explicitly limit answer scope and include disclaimer.

Document bots. Photo of invoice, receipt, passport — extract structured data. GPT-4o Vision + structured output via JSON Schema gives high accuracy on typical documents.

Inspection bots. Builder photographs defect — bot classifies defect type and creates task in management system.

Handling "Bad" Photos

Mandatory test cases:

  • Blurry image
  • Poor lighting
  • Off-topic photo (user sent cat instead of receipt)
  • Image with prohibited content

For the last — moderation before sending to main model. OpenAI Moderation API or Google Safe Search as first filter.

Implementation Process

Define use case scenarios for images: exactly what needs recognition.

Choose Vision API for task and budget.

Backend: image upload, Vision API call, response formation.

Mobile UI: gallery selection, camera, preview before sending.

Test in real field conditions — poor lighting, angles, partial visibility.

Timeline Estimates

Bot with basic Vision API (Google Vision or GPT-4o) — 3–5 days. With custom classification model, on-device inference and complex scenarios — 3–6 weeks.