Development of Image-Based Search in Mobile Application
User photographs product in competitor store or takes screenshot from Instagram — and wants find same in your catalog. Visual search covers this scenario. Technically task consists two parts: get vector representation of image (embedding) and find nearest neighbors in database.
Two implementation approaches
Embedded model on device. On iOS — Vision framework with VNGenerateImageFeaturePrintRequest, on Android — ML Kit Image Labeling or custom TFLite-model via TensorFlow Lite Task Library. Advantage: works offline, no network delay. Limitation: Apple's feature print works only within iOS ecosystem and not compatible with server index.
Server embedding. Image sent to server, processed through model (CLIP, EfficientNet, ResNet), vector returned, searched by index. More precise and flexible — same index works with iOS, Android, web.
In practice often choose server variant with local pre-processing: image compressed and normalized on device before sending.
Capture and prepare image
On iOS to select from gallery — PHPickerViewController (not UIImagePickerController, deprecated since iOS 14). For camera — AVCaptureSession with AVCapturePhotoOutput. Image before sending:
func prepareForSearch(image: UIImage) -> Data? {
// Scale to 512px by longest side
let maxDimension: CGFloat = 512
let scale = maxDimension / max(image.size.width, image.size.height)
let newSize = CGSize(width: image.size.width * scale,
height: image.size.height * scale)
UIGraphicsBeginImageContextWithOptions(newSize, false, 1.0)
image.draw(in: CGRect(origin: .zero, size: newSize))
let resized = UIGraphicsGetImageFromCurrentImageContext()
UIGraphicsEndImageContext()
return resized?.jpegData(compressionQuality: 0.85)
}
On Android — ActivityResultContracts.TakePicture() for camera and PickVisualMedia() for gallery (Photo Picker API, available Android 13+ and via Jetpack).
Server search: vector index
For nearest neighbor search by embedding use Qdrant, Weaviate or pgvector (if already PostgreSQL in stack). CLIP model from OpenAI gives good results for product search — trained on image-text pairs, works both ways: find text by photo and vice versa.
Request to server with progress indicator:
// Android, Retrofit + OkHttp
suspend fun searchByImage(imageBytes: ByteArray): List<SearchResult> {
val requestBody = imageBytes.toRequestBody("image/jpeg".toMediaType())
val part = MultipartBody.Part.createFormData("image", "search.jpg", requestBody)
return searchApi.visualSearch(part)
}
Important: handle case where server found no close matches (cosine distance > threshold). Show "nothing found" honestly, not return irrelevant results with distant vectors.
Pre-processing with CoreML / TFLite on device
If offline needed or need speed up response — embed lightweight model. MobileNetV3 or EfficientNet-Lite give reasonable accuracy/size trade-off. On iOS convert to .mlmodel via coremltools, on Android — to .tflite. Local index stored in SQLite with extension for cosine distance or use Faiss via JNI/FFI.
Work process
Catalog audit: size, product type, search accuracy requirements.
Architecture choice: server embedding or hybrid (local pre-processing + server index).
Prepare reference embeddings for catalog, setup vector index.
UI development: photo selection, crop tool (optional), display results with similarity score.
Testing with real user photos — poor lighting, angles, partial matches.
Timeline estimates
Integration with existing search API — 3–5 days. Full implementation including server (embedding service, vector index, catalog loading) — 3–6 weeks depending on catalog volume and accuracy requirements.







