On-device ML model Core ML integration for offline AI in iOS app

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.
Development and support of all types of mobile applications:
Information and entertainment mobile applications
News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators
E-commerce mobile applications
Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.
Business process management mobile applications
CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems
Electronic services mobile applications
Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Showing 1 of 1 servicesAll 1735 services
On-device ML model Core ML integration for offline AI in iOS app
Complex
~1-2 weeks
FAQ
Our competencies:
Development stages
Latest works
  • image_mobile-applications_feedme_467_0.webp
    Development of a mobile application for FEEDME
    756
  • image_mobile-applications_xoomer_471_0.webp
    Development of a mobile application for XOOMER
    624
  • image_mobile-applications_rhl_428_0.webp
    Development of a mobile application for RHL
    1052
  • image_mobile-applications_zippy_411_0.webp
    Development of a mobile application for ZIPPY
    947
  • image_mobile-applications_affhome_429_0.webp
    Development of a mobile application for Affhome
    862
  • image_mobile-applications_flavors_409_0.webp
    Development of a mobile application for the FLAVORS company
    445

On-Device ML Model Integration (Core ML) for Offline AI in iOS Apps

Core ML is not simply "run model on iPhone." It is a specific path from PyTorch/TensorFlow weights to calling .prediction() in a SwiftUI app, where each step has nuances that cost a week of work if unknown beforehand.

Model Conversion: coremltools

Most modern models arrive as PyTorch checkpoint or ONNX file. Convert via coremltools (Apple, Python package):

import coremltools as ct
import torch

# Say we have PyTorch image classification model
model = MyModel()
model.load_state_dict(torch.load("model.pth"))
model.eval()

# Tracing—pass example input data
example_input = torch.zeros(1, 3, 224, 224)
traced = torch.jit.trace(model, example_input)

# Convert
mlmodel = ct.convert(
    traced,
    inputs=[ct.ImageType(
        name="input_image",
        shape=(1, 3, 224, 224),
        color_layout=ct.colorlayout.RGB,
        bias=[-0.485/0.229, -0.456/0.224, -0.406/0.225],  # ImageNet normalization
        scale=1/(255.0 * 0.229)  # built into model, no need in Swift
    )],
    outputs=[ct.TensorType(name="class_probabilities")],
    compute_precision=ct.precision.FLOAT16,  # for ANE
    minimum_deployment_target=ct.target.iOS16
)

mlmodel.save("MyClassifier.mlpackage")

ct.precision.FLOAT16 + minimum_deployment_target=iOS16 means Core ML actively uses ANE (Apple Neural Engine). On iPhone 14 this is 4–8× faster than GPU for inference, while battery consumption much lower. On iOS 15 same model runs via Metal GPU.

ct.ImageType with built-in normalization—no need to convert UIImage to normalized FloatArray in Swift, Core ML does it.

Common Conversion Problems

Dynamic shapes—models with torch.Size([batch, seq_len, hidden]) where seq_len unfixed break torch.jit.trace. Solution: ct.RangeDim for variable sizes or multiple configs via ct.EnumeratedShapes.

# Variable sequence length
flexible_shape = ct.Shape(shape=(1, ct.RangeDim(1, 512), 768))
mlmodel = ct.convert(model, inputs=[ct.TensorType(shape=flexible_shape)])

Unsupported operations—e.g., custom CUDA kernels. coremltools throws NotImplementedError. Path: either rewrite operation on standard PyTorch primitives, or add custom ct.op layer via C++/Swift extension.

Error Unsupported model format on x86 simulator—simulator uses CPU fallback, some FLOAT16 operations unsupported. Test accuracy only on real device.

Loading and Running on iOS

import CoreML
import Vision

// Load model (once at startup)
let config = MLModelConfiguration()
config.computeUnits = .all  // ANE + GPU + CPU

// .mlpackage loaded from bundle
guard let modelURL = Bundle.main.url(forResource: "MyClassifier", withExtension: "mlpackage"),
      let model = try? MyClassifier(contentsOf: modelURL, configuration: config) else {
    fatalError("Failed to load model")
}

// Inference on background thread
DispatchQueue.global(qos: .userInitiated).async {
    do {
        let input = MyClassifierInput(input_image: cgImage)
        let output = try model.prediction(input: input)
        let probs = output.class_probabilities
        // probs—MLMultiArray, get value: probs[0].doubleValue
    } catch {
        print("Inference error: \(error)")
    }
}

Model loads ~100–300 ms (depends on size). Don't load in viewDidLoad—only once at app startup or first use, keep in memory while needed.

Vision Framework as Wrapper

For computer vision tasks VNCoreMLRequest more convenient—Vision handles input resizing, image orientation, coordinate transforms:

let coreMLModel = try VNCoreMLModel(for: model.model)  // .model—MLModel from generated class

let request = VNCoreMLRequest(model: coreMLModel) { request, error in
    guard let results = request.results as? [VNClassificationObservation] else { return }
    let topResult = results.sorted { $0.confidence > $1.confidence }.first
    print("\(topResult?.identifier ?? "?") — \(topResult?.confidence ?? 0)")
}
request.imageCropAndScaleOption = .centerCrop  // or .scaleFit

let handler = VNImageRequestHandler(cgImage: inputCGImage, options: [:])
try handler.perform([request])

VNCoreMLRequest automatically handles input size mismatch—pass any image, Vision resizes to model's expected size. Without Vision would need to do manually via vImage or CIImage.

Performance: Benchmarks

Device Model computeUnits Inference Time
iPhone 14 Pro MobileNetV3 (5 MB FP16) .all (ANE) 2–4 ms
iPhone 14 Pro ResNet-50 (48 MB FP16) .all (ANE) 8–15 ms
iPhone 12 BERT-base (350 MB FP16) .all 180–250 ms
iPhone SE 2nd gen MobileNetV3 (5 MB FP16) .cpuOnly 12–20 ms

Xcode Instruments → Core ML Instrument for profiling real ANE/GPU/CPU usage.

Update Model Without App Update

Core ML supports loading model from any URL, not just bundle. Allows server model updates:

// Load mlpackage from documents directory
let documentsURL = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask)[0]
let downloadedModelURL = documentsURL.appendingPathComponent("updated_model.mlpackage")

if FileManager.default.fileExists(atPath: downloadedModelURL.path) {
    let model = try MyClassifier(contentsOf: downloadedModelURL, configuration: config)
} else {
    // Fallback to bundle
}

Load model over network via URLSession, save to Documents, verify via SHA256 hash before use.

Process

Get weights → convert with precision and compute units tuning → profile on target devices → integrate in app with fallback and error handling → optionally: remote model update.

Timeline Estimates

Convert existing model + basic iOS integration takes 1–2 weeks. Complex model with non-standard operations, multiple inputs/outputs, remote update requires 3–5 weeks.