On-Device ML Model Integration (Core ML) for Offline AI in iOS Apps
Core ML is not simply "run model on iPhone." It is a specific path from PyTorch/TensorFlow weights to calling .prediction() in a SwiftUI app, where each step has nuances that cost a week of work if unknown beforehand.
Model Conversion: coremltools
Most modern models arrive as PyTorch checkpoint or ONNX file. Convert via coremltools (Apple, Python package):
import coremltools as ct
import torch
# Say we have PyTorch image classification model
model = MyModel()
model.load_state_dict(torch.load("model.pth"))
model.eval()
# Tracing—pass example input data
example_input = torch.zeros(1, 3, 224, 224)
traced = torch.jit.trace(model, example_input)
# Convert
mlmodel = ct.convert(
traced,
inputs=[ct.ImageType(
name="input_image",
shape=(1, 3, 224, 224),
color_layout=ct.colorlayout.RGB,
bias=[-0.485/0.229, -0.456/0.224, -0.406/0.225], # ImageNet normalization
scale=1/(255.0 * 0.229) # built into model, no need in Swift
)],
outputs=[ct.TensorType(name="class_probabilities")],
compute_precision=ct.precision.FLOAT16, # for ANE
minimum_deployment_target=ct.target.iOS16
)
mlmodel.save("MyClassifier.mlpackage")
ct.precision.FLOAT16 + minimum_deployment_target=iOS16 means Core ML actively uses ANE (Apple Neural Engine). On iPhone 14 this is 4–8× faster than GPU for inference, while battery consumption much lower. On iOS 15 same model runs via Metal GPU.
ct.ImageType with built-in normalization—no need to convert UIImage to normalized FloatArray in Swift, Core ML does it.
Common Conversion Problems
Dynamic shapes—models with torch.Size([batch, seq_len, hidden]) where seq_len unfixed break torch.jit.trace. Solution: ct.RangeDim for variable sizes or multiple configs via ct.EnumeratedShapes.
# Variable sequence length
flexible_shape = ct.Shape(shape=(1, ct.RangeDim(1, 512), 768))
mlmodel = ct.convert(model, inputs=[ct.TensorType(shape=flexible_shape)])
Unsupported operations—e.g., custom CUDA kernels. coremltools throws NotImplementedError. Path: either rewrite operation on standard PyTorch primitives, or add custom ct.op layer via C++/Swift extension.
Error Unsupported model format on x86 simulator—simulator uses CPU fallback, some FLOAT16 operations unsupported. Test accuracy only on real device.
Loading and Running on iOS
import CoreML
import Vision
// Load model (once at startup)
let config = MLModelConfiguration()
config.computeUnits = .all // ANE + GPU + CPU
// .mlpackage loaded from bundle
guard let modelURL = Bundle.main.url(forResource: "MyClassifier", withExtension: "mlpackage"),
let model = try? MyClassifier(contentsOf: modelURL, configuration: config) else {
fatalError("Failed to load model")
}
// Inference on background thread
DispatchQueue.global(qos: .userInitiated).async {
do {
let input = MyClassifierInput(input_image: cgImage)
let output = try model.prediction(input: input)
let probs = output.class_probabilities
// probs—MLMultiArray, get value: probs[0].doubleValue
} catch {
print("Inference error: \(error)")
}
}
Model loads ~100–300 ms (depends on size). Don't load in viewDidLoad—only once at app startup or first use, keep in memory while needed.
Vision Framework as Wrapper
For computer vision tasks VNCoreMLRequest more convenient—Vision handles input resizing, image orientation, coordinate transforms:
let coreMLModel = try VNCoreMLModel(for: model.model) // .model—MLModel from generated class
let request = VNCoreMLRequest(model: coreMLModel) { request, error in
guard let results = request.results as? [VNClassificationObservation] else { return }
let topResult = results.sorted { $0.confidence > $1.confidence }.first
print("\(topResult?.identifier ?? "?") — \(topResult?.confidence ?? 0)")
}
request.imageCropAndScaleOption = .centerCrop // or .scaleFit
let handler = VNImageRequestHandler(cgImage: inputCGImage, options: [:])
try handler.perform([request])
VNCoreMLRequest automatically handles input size mismatch—pass any image, Vision resizes to model's expected size. Without Vision would need to do manually via vImage or CIImage.
Performance: Benchmarks
| Device | Model | computeUnits | Inference Time |
|---|---|---|---|
| iPhone 14 Pro | MobileNetV3 (5 MB FP16) | .all (ANE) | 2–4 ms |
| iPhone 14 Pro | ResNet-50 (48 MB FP16) | .all (ANE) | 8–15 ms |
| iPhone 12 | BERT-base (350 MB FP16) | .all | 180–250 ms |
| iPhone SE 2nd gen | MobileNetV3 (5 MB FP16) | .cpuOnly | 12–20 ms |
Xcode Instruments → Core ML Instrument for profiling real ANE/GPU/CPU usage.
Update Model Without App Update
Core ML supports loading model from any URL, not just bundle. Allows server model updates:
// Load mlpackage from documents directory
let documentsURL = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask)[0]
let downloadedModelURL = documentsURL.appendingPathComponent("updated_model.mlpackage")
if FileManager.default.fileExists(atPath: downloadedModelURL.path) {
let model = try MyClassifier(contentsOf: downloadedModelURL, configuration: config)
} else {
// Fallback to bundle
}
Load model over network via URLSession, save to Documents, verify via SHA256 hash before use.
Process
Get weights → convert with precision and compute units tuning → profile on target devices → integrate in app with fallback and error handling → optionally: remote model update.
Timeline Estimates
Convert existing model + basic iOS integration takes 1–2 weeks. Complex model with non-standard operations, multiple inputs/outputs, remote update requires 3–5 weeks.







