Machine Learning Development (Core ML) in iOS Applications
Converting a model from Python environment to mobile production immediately reveals format incompatibilities, inference latency issues, and lack of update mechanisms without App Store releases. Core ML solves these problems natively—if integrated correctly into your app architecture.
Common Time Wasters During Core ML Integration
The most frequent mistake: model conversion without considering target hardware. coremltools allows specifying minimum_deployment_target and compute unit: cpuOnly, cpuAndGPU, or cpuAndNeuralEngine. Failing to specify cpuAndNeuralEngine for A12+ means the model won't hit the Neural Engine and runs on CPU—5–10x slower for convolutional networks.
Second issue: input format. Core ML expects CVPixelBuffer with specific kCVPixelFormatType. If your app gets UIImage from camera via AVCapturePhotoOutput, intermediate conversion is needed: UIImage → CIImage → CVPixelBuffer. Doing this on the main thread guarantees dropped frames. The entire capture-to-inference pipeline must run on a DispatchQueue with QoS .userInteractive or through the Vision framework, which manages buffers automatically.
Vision + CoreML is the right combination for most tasks: VNCoreMLRequest handles scaling, normalization, and buffer management. For sequences of inferences on video streams, use VNSequenceRequestHandler—it caches state between frames.
Core ML Integration Process
Start with source model audit: format (ONNX, TensorFlow SavedModel, PyTorch TorchScript), weight size, operation count. Use coremltools 7.x for conversion. For quantized models, use ct.optimize.coreml with LinearQuantizer or PalettizationConfig. 8-bit quantization cuts model size by 4x without noticeable accuracy loss on most classifiers.
Real example: fintech client needed document forgery detection on-device. Original TFLite model (MobileNetV3, 12 MB) gave 280 ms on iPhone 12. After converting to .mlpackage with computeUnits = .cpuAndNeuralEngine and Float16 compression: 34 ms on the same device. Wrapped inference in MLModelConfiguration with allowLowPrecisionAccumulationOnGPU = true.
For model updates without releases, set up loading via CloudKit or custom S3-compatible storage. MLModel(contentsOf:) accepts local URLs—model downloads in background, verifies via SHA-256, swaps atomically through FileManager.replaceItem. Old version stored as fallback.
Architecturally, isolate the ML layer in a separate module (Swift Package) with MLInferenceService protocol. Enables mock implementations in tests and reuse across multiple targets.
What's Included
- Source model audit and conversion path selection
- Conversion to
.mlmodel/.mlpackageviacoremltools - Optimization: quantization, pruning, compute unit selection
- Integration via
Visionor directMLModelAPI - OTA model update mechanism (CloudKit / S3)
- Unit tests for inference with reference inputs/outputs
- Profiling via Xcode Instruments (Core ML Instrument)
Timeline
Integrating a pre-converted model into existing app: 3–5 business days. Converting, optimizing, and setting up OTA updates from scratch: 1–2 weeks. Cost calculated individually after requirements and source model review.







