Image Recognition Implementation in Mobile Applications
Image recognition in a mobile app is not just plugging in a ready API. The real task has three parts: capture images from the right source, apply correct preprocessing, pass to the model, and handle results with UX in mind. Each part has its pitfalls.
Image Sources and Their Quirks
Camera via AVCaptureSession (iOS) or CameraX (Android) is the most complex case. Data arrives as CMSampleBuffer / ImageProxy in YUV_420_888 or BGRA format. Models expect RGB float32 or uint8. YUV → RGB conversion without native code is a source of latency. On Android use ImageAnalysis.Builder().setOutputImageFormat(ImageAnalysis.OUTPUT_IMAGE_FORMAT_RGBA_8888)—provides the needed format without manual conversion.
Gallery is simpler but has an EXIF orientation trap. UIImage on iOS correctly accounts for orientation when displayed, but the underlying CGImage may be rotated. Pass CGImage directly to the model and recognition accuracy drops for vertically-shot photos. Correct approach: CIImage(image: uiImage) → CIContext.createCGImage with orientation transform applied.
On Android, BitmapFactory.decodeFile ignores EXIF. Use ExifInterface with Matrix.postRotate. Otherwise the model receives a rotated image.
Preprocessing: Critical for Accuracy
Most classifiers train on ImageNet with normalization mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]. Replicate exactly at inference. Mismatched normalization drops accuracy 15–30% without code errors.
Resize preserving aspect ratio or crop-to-fit is a fundamental choice. If the model trained on square images with center_crop, but you do fit with padding, the model sees padding as part of the image and gets confused. Match the model's training preprocessing exactly.
Our Pipeline Approach
For iOS: VNImageRequestHandler + VNCoreMLRequest is the cleanest path. Vision handles orientation and resize automatically. For heavy models, use separate MLModelConfiguration with computeUnits = .cpuAndNeuralEngine.
For Android with ML Kit: ImageLabeler via InputImage.fromMediaImage(mediaImage, rotationDegrees). Get rotation degrees from ImageProxy.imageInfo.rotationDegrees—don't calculate manually.
For custom TFLite models on Android, use ImageClassifier from Task Library. It handles normalization (if in model metadata), resize, and format conversion.
Inference results arrive asynchronously in callbacks—update UI only on main thread. Use LiveData (Android) or @MainActor (iOS Swift Concurrency).
Case study: mushroom identification app using photo. Model: EfficientNetV2-S converted to Core ML. Test set accuracy 91%, real user photos 73%. Problem: users shoot mushrooms from below at angles; training data is overhead views. Added VNClassifyImageRequest with confidence threshold 0.6. When confidence is low, suggest re-shooting with framing instructions. User accuracy jumped to 84%.
Process Overview
Requirements audit (image source, platform, accuracy, latency) → model and framework selection → preprocessing pipeline implementation → inference integration → real-data testing → threshold tuning → CI handoff.
Timeline: 1–2 weeks depending on model complexity and preprocessing availability. Cost calculated individually.







