AI-Powered Photo Gallery Classification for Mobile Apps
Photo classification in a gallery is one of the few AI tasks where on-device processing is the standard, not an exception. Apple Photos, Google Photos — both use on-device ML. Uploading personal user photos to a server for classification is technically redundant and wrong from a privacy perspective.
On-Device Classification: What Works Out of the Box
On iOS, use the Vision framework with VNClassifyImageRequest. No third-party models needed — built-in classification covers 1000+ categories:
import Vision
func classifyPhoto(cgImage: CGImage, completion: @escaping ([String]) -> Void) {
let request = VNClassifyImageRequest { request, error in
guard let results = request.results as? [VNClassificationObservation] else { return }
// Take categories with confidence > 0.5
let labels = results
.filter { $0.confidence > 0.5 }
.map { $0.identifier }
completion(labels)
}
try? VNImageRequestHandler(cgImage: cgImage, options: [:]).perform([request])
}
Inference time is 5–15 ms per photo depending on device. On iPhone 13+ Neural Engine processes an entire 1000-photo gallery in ~15–20 seconds as a background task.
On Android, use ML Kit ImageLabeler with ImageLabelerOptions:
val labeler = ImageLabeling.getClient(
ImageLabelerOptions.Builder()
.setConfidenceThreshold(0.5f)
.build()
)
labeler.process(InputImage.fromBitmap(bitmap, 0))
.addOnSuccessListener { labels ->
val categories = labels.map { it.text }
// "Dog", "Outdoor", "Sky", "Food", etc.
}
ML Kit supports 400+ categories on-device without network.
Processing Entire Gallery: PHFetchResult and Batching
Problem: a gallery can contain 50,000+ photos. Iterating through all of them at once blocks the main thread and drains battery.
Right approach: PHFetchResult + incremental processing via DispatchQueue.global(qos: .background):
func classifyGallery() {
let fetchOptions = PHFetchOptions()
fetchOptions.sortDescriptors = [NSSortDescriptor(key: "creationDate", ascending: false)]
let allPhotos = PHAsset.fetchAssets(with: .image, options: fetchOptions)
let batchSize = 50
let processingQueue = DispatchQueue(label: "photo.classification", qos: .background)
processingQueue.async {
var offset = 0
while offset < allPhotos.count {
let batch = (offset..<min(offset + batchSize, allPhotos.count))
.map { allPhotos.object(at: $0) }
self.processBatch(assets: batch)
offset += batchSize
Thread.sleep(forTimeInterval: 0.1) // Let the system breathe
}
}
}
Thread.sleep(0.1) between batches is critical — without it, CPU throttles after 2–3 minutes and speed drops 3–5x.
Storing Classification Results
Save results locally, not on server. Use Core Data with NSPersistentContainer:
// Entity: PhotoClassification
// Attributes: assetLocalIdentifier (String), labels (Transformable: [String]), classifiedAt (Date)
Index on assetLocalIdentifier + index on labels for fast search. For 50k photos, table weighs ~5–10 MB.
When opening gallery, pull classifications from local DB, display. New photos (added after last session) classify incrementally via PHPhotoLibraryChangeObserver.
Custom Categories via CoreML
Built-in Vision doesn't always cover needed categories. For custom ones (e.g., "recipe", "document screenshot", "receipt", "visa") — train a CreateML model:
// CreateML: 5–10 examples per category sufficient for basic accuracy
let dataSource = MLImageClassifier.DataSource.labeledDirectories(at: trainingDir)
let model = try MLImageClassifier(trainingData: dataSource)
try model.write(to: modelURL)
Accuracy on custom categories with 20+ examples — 85–95%. Model in CoreML format — 5–15 MB. Can be delivered via Core ML Model Deployment without app update.
Common Mistakes
Classifying on main thread — most frequent. VNImageRequestHandler.perform executes synchronously. Always use background queue.
Requesting full-resolution for classification — unnecessary. PHAsset requestImage with targetSize: CGSize(width: 224, height: 224) is enough — this is standard input for most classification models.
Not updating classifications on gallery changes — PHPhotoLibraryChangeObserver should trigger incremental processing only for new/modified photos.
Timelines
Basic classification with Vision + Core Data storage — 4–6 days. Full implementation with custom categories, incremental updates, and fast tag search — 2–3 weeks. Cost calculated individually.







