Machine Learning Development (PyTorch Mobile) in Mobile Applications
PyTorch Mobile is less common than TFLite and Core ML, but wins in certain scenarios. Especially when the data science team works in PyTorch and doesn't want time spent on format conversion, or when you need models with non-standard operations that TFLite doesn't support.
TorchScript: The Critical Pre-Deployment Requirement
PyTorch Mobile only works with TorchScript models—neither eager mode nor torch.fx are suitable. Convert via torch.jit.trace or torch.jit.script. The difference is fundamental: trace records the execution path for specific inputs and can't handle data-dependent branching. script analyzes the graph statically and correctly handles if/for, but requires type annotations.
If your model contains if x.shape[0] > 1:, trace silently records only one branch. In production, this manifests as incorrect results on batches of certain sizes—not a crash, making it hard to catch.
After conversion, optimize via optimize_for_mobile:
from torch.utils.mobile_optimizer import optimize_for_mobile
scripted = torch.jit.script(model)
optimized = optimize_for_mobile(scripted)
optimized._save_for_lite_interpreter("model.ptl")
.ptl (Lite Interpreter format) is different from .pt. On mobile, use the Lite Interpreter—it doesn't support all PyTorch operations but has a smaller binary size.
Quantization and Performance
Post-training static quantization for mobile:
model.qconfig = torch.quantization.get_default_qconfig('qnnpack') # for ARM
torch.quantization.prepare(model, inplace=True)
# run calibration dataset
torch.quantization.convert(model, inplace=True)
qnnpack is the backend for ARM processors (what you need for Android and iOS). fbgemm is for x86 and doesn't work on mobile. This is a common mistake: developer quantizes with fbgemm on their laptop, wonders why the model doesn't speed up on phone.
In practice, INT8 gains on ARM: 2–3x on Linear and Conv2d operations. On iPhone with Neural Engine, PyTorch Mobile doesn't leverage it directly—unlike Core ML. If you need Neural Engine on iOS, the correct path is conversion through coremltools, not PyTorch Mobile.
Integration on Android and iOS
Android. Dependency org.pytorch:pytorch_android_lite (Lite Interpreter). Inference:
val module = LiteModuleLoader.load(assetFilePath("model.ptl"))
val inputTensor = TensorImageUtils.bitmapToFloat32Tensor(bitmap, mean, std)
val output = module.forward(IValue.from(inputTensor)).toTensor()
Offload image preprocessing (normalization, resize) to Executors.newSingleThreadExecutor()—not main thread.
iOS. CocoaPod LibTorch-Lite. Work through TorchModule:
let module = TorchModule(fileAtPath: modelPath)
let result = module.predict(image: &tensorData)
All inference runs on DispatchQueue.global(qos: .userInitiated).
Example: NLP task, BERT-lite for review classification in corporate app. DS team worked in PyTorch and didn't want to reconvert to TFLite (non-standard attention block). Used TorchScript + INT8 quantization (qnnpack), model size 23 MB → 6 MB, inference on Pixel 6: 45 ms on 128 tokens. Sufficient for real-time analytics.
Timeline
Converting and integrating a ready PyTorch model into Android or iOS: 1–2 weeks including TorchScript debugging and device testing. Cost calculated individually.







