Implementing AI Image Upscaling in a Mobile App
AI upscaling differs fundamentally from standard "stretch via bicubic" — neural network recovers details not in original. Photo 512×512 → 2048×2048 with real skin textures, fur, fabric. On mobile, implemented three ways: on-device via Core ML / ONNX, cloud API, or hybrid.
On-device: Real-ESRGAN via Core ML (iOS)
Real-ESRGAN — best-quality model for ×4 upscale. Core ML-converted versions exist:
import CoreML
import Vision
class ImageUpscaler {
private let model: VNCoreMLModel
init() throws {
let config = MLModelConfiguration()
config.computeUnits = .cpuAndNeuralEngine // Use Neural Engine
let coreMLModel = try RealESRGAN(configuration: config)
model = try VNCoreMLModel(for: coreMLModel.model)
}
func upscale(_ image: UIImage) async throws -> UIImage {
guard let cgImage = image.cgImage else { throw UpscaleError.invalidInput }
return try await withCheckedThrowingContinuation { continuation in
let request = VNCoreMLRequest(model: model) { request, error in
if let error = error {
continuation.resume(throwing: error)
return
}
guard let results = request.results as? [VNPixelBufferObservation],
let outputBuffer = results.first?.pixelBuffer else {
continuation.resume(throwing: UpscaleError.noOutput)
return
}
let ciImage = CIImage(cvPixelBuffer: outputBuffer)
let context = CIContext()
guard let outputCG = context.createCGImage(ciImage, from: ciImage.extent) else {
continuation.resume(throwing: UpscaleError.conversionFailed)
return
}
continuation.resume(returning: UIImage(cgImage: outputCG))
}
request.imageCropAndScaleOption = .scaleFit
let handler = VNImageRequestHandler(cgImage: cgImage)
try? handler.perform([request])
}
}
}
Limitation: Real-ESRGAN expects fixed-size input tile (usually 256×256 or 512×512). For larger images — split into tiles, upscale each, merge with overlap (16–32 pixels) to avoid visible seams.
func upscaleTiled(_ image: UIImage, tileSize: Int = 256, overlap: Int = 16) async throws -> UIImage {
let tiles = splitIntoTiles(image: image, tileSize: tileSize, overlap: overlap)
let upscaledTiles = try await withThrowingTaskGroup(of: (Int, Int, UIImage).self) { group in
for (row, col, tile) in tiles {
group.addTask {
let upscaled = try await self.upscale(tile)
return (row, col, upscaled)
}
}
var results: [(Int, Int, UIImage)] = []
for try await result in group { results.append(result) }
return results
}
return mergeTiles(upscaledTiles, originalSize: image.size, scaleFactor: 4, overlap: overlap)
}
On iPhone 15 Pro with Neural Engine: 512×512 → 2048×2048 in ~800 ms. 1024×1024 requires tiling → 2–4 seconds.
On-device: ONNX Runtime on Android
Real-ESRGAN in ONNX format (~15 MB for small model):
class OnnxUpscaler(context: Context) {
private val session: OrtSession
init {
val env = OrtEnvironment.getEnvironment()
val options = OrtSession.SessionOptions().apply {
addNnapi() // Use NNAPI for acceleration
}
val modelBytes = context.assets.open("realesrgan_x4.onnx").readBytes()
session = env.createSession(modelBytes, options)
}
fun upscale(bitmap: Bitmap): Bitmap {
// Convert Bitmap to float tensor [1, 3, H, W], normalize to [0, 1]
val inputTensor = bitmapToTensor(bitmap)
val inputName = session.inputNames.first()!!
val output = session.run(mapOf(inputName to inputTensor))
val outputTensor = output[0].value as Array<*>
// Convert tensor back to Bitmap
return tensorToBitmap(outputTensor, bitmap.width * 4, bitmap.height * 4)
}
}
NNAPI on modern Android devices gives 2–4× speedup vs CPU. On Snapdragon 8 Gen 2 — 512×512 in ~1.2 seconds.
Cloud APIs
When on-device too slow or higher upscale factor needed (×8, ×16):
Replicate — Real-ESRGAN:
let body: [String: Any] = [
"version": "...", // real-esrgan model hash
"input": [
"image": "data:image/jpeg;base64,\(base64Image)",
"scale": 4,
"face_enhance": true // GFPGAN for face enhancement
]
]
face_enhance: true runs GFPGAN on top of Real-ESRGAN — important for portraits, without it faces get artifacts.
Stability AI Upscale API:
val requestBody = MultipartBody.Builder()
.setType(MultipartBody.FORM)
.addFormDataPart("image", "photo.jpg", imageFile.asRequestBody("image/jpeg".toMediaType()))
.addFormDataPart("output_format", "png")
.build()
Stability AI returns PNG with ×4 upscale via Creative Upscaler (SD-based, adds details) or Conservative Upscaler (fewer original changes).
Approach selection
| Scenario | Recommendation |
|---|---|
| Fast camera photo upscale | On-device (VisionKit/ONNX), tiling |
| Portraits with face restoration | Replicate Real-ESRGAN + face_enhance |
| Document scans/text | Stability AI Conservative Upscaler |
| Old photos (×8 and above) | Cloud — Real-ESRGAN or Topaz Gigapixel API |
| Offline requirement | On-device mandatory |
UX: progress and comparison
On-device upscale — show tiling progress well: "Processing 3 of 12 fragments". User doesn't feel hung.
After completion — interactive before/after slider (MagnificationGesture on iOS, custom View with touch on Android). This is standard UI for any photo enhancement tools.
Timeline
On-device upscale with tiling and progress indicator — 5–8 days. Cloud upscale + face enhancement + before/after slider + gallery save — 8–12 days. Hybrid with quality assessment and path selection — another 3–5 days.







