AI Photo Quality Enhancement in Mobile Apps
Photos from a budget smartphone camera or taken in poor lighting—noise, blur, loss of detail. "Brightness +20, contrast +10" filters don't solve this. AI upscaling and denoising recover details that classical DSP simply cannot see.
What Enhancement Models Actually Do
Denoising—removes sensor matrix noise at high ISO. Models like DnCNN, FFDNet work at image patch level, learn to distinguish "useful" gradient (texture) from noise.
Upscaling (Super Resolution)—Real-ESRGAN, ESRGAN, SRCNN recover pixels on enlargement. Real-ESRGAN 4x turns 512×512 into 2048×2048, recovering hair textures, fabric, text. Not "blurs" like bicubic, but synthesizes details.
Exposure/HDR Correction—models like Zero-DCE or EnlightenGAN work with underexposed shots without introducing overexposure artifacts.
On mobile all three run via some ML runtime—the question is which model and in what format.
Running Real-ESRGAN on iOS via Core ML
Real-ESRGAN x4 originally—16.7M parameters, ~2 GB RAM during inference on full resolution. Won't fit on mobile without optimization. Solution—tiled inference: slice image into overlapping patches (tile_size=256, overlap=16), process sequentially, assemble with feather-blending at seams.
Convert PyTorch model to Core ML via coremltools:
import coremltools as ct
import torch
from realesrgan import RealESRGAN # pretrained model
model = RealESRGAN(device='cpu', scale=4)
model.load_weights('RealESRGAN_x4plus.pth')
model.model.eval()
example_input = torch.zeros(1, 3, 256, 256) # tile size
traced = torch.jit.trace(model.model, example_input)
mlmodel = ct.convert(
traced,
inputs=[ct.ImageType(name="input", shape=(1, 3, 256, 256),
color_layout=ct.colorlayout.RGB)],
compute_precision=ct.precision.FLOAT16, # FP16 for ANE
minimum_deployment_target=ct.target.iOS16
)
mlmodel.save("RealESRGAN_x4_tile256.mlpackage")
FLOAT16 with iOS16+ target means Core ML delegates computation to ANE (Apple Neural Engine). On iPhone 14, inference of one 256×256 tile is ~80–120 ms. 12 MP photo (4032×3024) slices into ~180 tiles, processed sequentially—total 15–25 seconds. Normal for one-off "enhance photo".
// Load model
let config = MLModelConfiguration()
config.computeUnits = .all // ANE + GPU + CPU
guard let model = try? RealESRGAN_x4_tile256(configuration: config) else { return }
// Infer one tile
let pixelBuffer = tileImage.toCVPixelBuffer()!
let output = try model.prediction(input: .init(input: pixelBuffer))
let enhancedTile = output.output.cgImage // assemble back
Android: TFLite with ESRGAN
On Android—similar scheme via TensorFlow Lite. ESRGAN (mobile-simplified) available as .tflite file 3–5 MB. Run via TFLite Interpreter with GpuDelegate:
val options = Interpreter.Options().apply {
addDelegate(GpuDelegate())
setNumThreads(4)
}
val interpreter = Interpreter(loadModelFile("esrgan_lite.tflite"), options)
val inputBuffer = ByteBuffer.allocateDirect(1 * 256 * 256 * 3 * 4) // FLOAT32
val outputBuffer = ByteBuffer.allocateDirect(1 * 1024 * 1024 * 3 * 4) // x4 output
interpreter.run(inputBuffer, outputBuffer)
GpuDelegate gives 3–5× speedup vs CPU on most devices with OpenGL ES 3.1+. On devices without GPU delegate (some old MediaTek)—fallback to NNAPI or CPU with processing time warning.
Denoising: When Upscaling Isn't Needed
For denoising without resolution change—FFDNet or DRUNet. Lighter (1–3 MB), faster. On iOS convenient via VNGenerateImageFeaturePrintRequest + custom Core ML model, or direct MLModel with CVPixelBuffer input.
One real detail: when converting to Core ML, normalize input data (0–1 instead of 0–255) and explicitly specify in model's preprocessing, otherwise model outputs black or blown image—common conversion mistake.
UX: Showing Progress
Tiled processing convenient because you can show progress bar: "tile N of M done". User sees app working. Without UI updates on iOS, watchdog triggers—app seems frozen and can be force-closed.
All inference—background thread (DispatchQueue.global(.userInitiated) or Task.detached(priority: .userInitiated) for Swift Concurrency). UI updates strictly on main thread.
Process
Audit requirements: need upscaling, denoising, or both. Pick model for target devices (min supported CPU/GPU). Convert PyTorch/ONNX → Core ML/TFLite with speed measurements on real devices. Implement tiled inference with feather-blending. Integrate into UI with progress tracking.
Timeline Estimates
One platform, basic model (denoising or upscaling)—2–3 weeks. Both platforms with multiple models, quality tuning, complex feather-blending—5–8 weeks.







