Implementing AI Photo Stylization in a Mobile App
Neural Style Transfer on mobile — not simply "run through Core ML". Main pain: VGG-19 class model with full weights is 500+ MB, and NST in real-time requires GPU acceleration. On iPhone 12 without Metal Performance Shaders, processing one 512×512 frame takes 3–4 seconds. Users won't accept that.
Two architectural paths — why second is often chosen
Server-side processing via API (Replicate, Stability AI, custom backend with PyTorch) — simplest path. Photo sent to server, stylized result returns. Latency — 3 to 15 seconds depending on queue and size. Good for one-off processing in editor, not for video stream.
On-device via CoreML / TFLite — requires distilled model. Instead of full NST, use Fast Neural Style Transfer (Johnson et al.) with MobileNet backbone. Model size — 6–8 MB, inference time on iPhone 12 Neural Engine — 80–120 ms per 512×512 frame. Android via TFLite with GPU delegate — comparable metrics on Snapdragon 870+.
In practice, hybrid approach makes sense: on-device for real-time preview (reduced 256×256 resolution), server processing for final 4K export.
How to prepare model for mobile deployment
Can't take standard PyTorch checkpoint and feed to Core ML Tools directly. Chain:
- Train or take ready Fast NST in PyTorch (torchvision.models or custom)
-
torch.onnx.export→ ONNX graph -
coremltools.convert(onnx_model, compute_precision=ct.precision.FLOAT16)→.mlpackage - Test on simulator via
MLModel.prediction(from:)
FLOAT16 quantization gives 2× size reduction without noticeable quality loss. INT8 — smaller still, but texture artifacts become visible.
For Android: tf.lite.TFLiteConverter.from_keras_model() → .tflite, then post-training quantization with DEFAULT or FLOAT16 strategy.
iOS integration
import CoreML
import Vision
class StyleTransferProcessor {
private let model: VNCoreMLModel
init() throws {
let mlModel = try FastNST(configuration: MLModelConfiguration()).model
model = try VNCoreMLModel(for: mlModel)
}
func process(image: CGImage, completion: @escaping (CGImage?) -> Void) {
let request = VNCoreMLRequest(model: model) { req, _ in
guard let obs = req.results?.first as? VNPixelBufferObservation else {
completion(nil); return
}
let ciImage = CIImage(cvPixelBuffer: obs.pixelBuffer)
completion(CIContext().createCGImage(ciImage, from: ciImage.extent))
}
request.imageCropAndScaleOption = .scaleFill
try? VNImageRequestHandler(cgImage: image).perform([request])
}
}
Metal Performance Shaders engaged automatically via Neural Engine — no need to write shaders manually.
Memory and battery management
Most common mistake — keep model loaded entire app lifetime. On devices with 3 GB RAM (iPhone SE 2nd gen, budget Android) this triggers jetsam kill when backgrounding. Rule: MLModel initialized lazily, on first access, unloaded if user inactive with feature for 10 minutes.
Battery: NST is Neural Engine under load. For live preview, limit inference frequency to 10–15 fps via CADisplayLink with preferredFramesPerSecond. Full real-time 30 fps on iPhone 14 Pro feasible, but drains battery noticeably faster.
Problem with large resolutions
CoreML models accept fixed input shape. If model trained on 512×512 but user loaded iPhone 14 Pro Max photo (48 MP, 8064×6048) — downscale before inference and upscale result back. Simple UIImage(cgImage:) resize works but loses detail.
Better: Guided Upsample via MPSImageBilinearScale or Real-ESRGAN for upscaling result. Adds processing step, but final photo looks faithful even in 4K.
Server path: Replicate and custom backend
If unwilling to wrestle with Core ML, Replicate API gives access to Neural Style Transfer models:
POST https://api.replicate.com/v1/predictions
Authorization: Token <key>
{
"version": "<model_version_hash>",
"input": {
"content_image": "<base64_or_url>",
"style_image": "<base64_or_url>",
"output_image_size": 1024
}
}
Poll status via GET /v1/predictions/{id} every 2 seconds. Result usually ready in 8–20 seconds. From mobile client — only via backend proxy (never store API key on client).
Timeline
On-device integration of ready model (CoreML or TFLite) — 3–5 days. Full cycle with model selection/training, quantization, live preview, and server export — 2–4 weeks. Cost calculated individually after clarifying platform and quality requirements.







