TensorFlow Lite Mobile ML Development

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.
Development and support of all types of mobile applications:
Information and entertainment mobile applications
News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators
E-commerce mobile applications
Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.
Business process management mobile applications
CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems
Electronic services mobile applications
Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Showing 1 of 1 servicesAll 1735 services
TensorFlow Lite Mobile ML Development
Complex
~1-2 weeks
FAQ
Our competencies:
Development stages
Latest works
  • image_mobile-applications_feedme_467_0.webp
    Development of a mobile application for FEEDME
    756
  • image_mobile-applications_xoomer_471_0.webp
    Development of a mobile application for XOOMER
    624
  • image_mobile-applications_rhl_428_0.webp
    Development of a mobile application for RHL
    1052
  • image_mobile-applications_zippy_411_0.webp
    Development of a mobile application for ZIPPY
    947
  • image_mobile-applications_affhome_429_0.webp
    Development of a mobile application for Affhome
    862
  • image_mobile-applications_flavors_409_0.webp
    Development of a mobile application for the FLAVORS company
    445

Machine Learning Development (TensorFlow Lite) in Mobile Applications

TensorFlow Lite is one of two mobile on-device ML standards (alongside Core ML and ONNX Runtime). Its strength lies in control: you choose the delegate, quantization level, and model loading method. Its weakness is that same flexibility: wrong delegate or unoptimized model yields worse performance than a cloud API.

Delegates: Where Most Mistakes Happen

TfLiteGpuDelegateV2 on Android gives real gains only with batch inference or heavy convolutional models (EfficientDet, MobileNet SSD). On light models (MobileNetV2 with 224×224 input), GPU delegate is slower than CPU due to memory-to-GPU transfer overhead. We profiled this on Xiaomi Redmi Note 11: CPU 78 ms, GPU 112 ms. Takeaway: always measure on target devices, not flagships.

NNAPI delegate (NnApiDelegate) theoretically uses hardware accelerators (DSP, NPU), but operation support is very uneven. If the model contains non-standard ops (e.g., custom squeeze-excitation block), NNAPI silently falls back to CPU. Always log InterpreterApi.Options.setNumThreads and check via Interpreter.getSignatureInputs() which operations actually run on the accelerator.

On iOS, TFLite uses CoreMLDelegate—a wrapper over Core ML. For target >= iOS 12, CoreMLDelegate automatically leverages Neural Engine for supported layers. Unsupported layers fall back to TFLite's CPU interpreter. Mixed execution works but latency is unpredictable without profiling.

Model Optimization Before Deployment

Quantization is mandatory for mobile. Three options:

  • Post-training dynamic range quantization — simplest, weights compressed to INT8, activations remain float. Model size shrinks ~4x, CPU speed improves 20–40%.
  • Post-training integer quantization — both weights and activations in INT8, requires calibration dataset. Needed for NNAPI and Edge TPU.
  • Quantization-aware training (QAT) — best INT8 accuracy but requires model retraining.

Using tf.lite.TFLiteConverter with optimizations = [tf.lite.Optimize.DEFAULT] covers the first two. QAT configured via tfmot.quantization.keras.quantize_model.

Real example: plant recognition app (EfficientNetB0 classifier, 29 MB float32). After full integer quantization: 7.4 MB, inference with NNAPI on Pixel 7: 18 ms versus 95 ms on float32 CPU. On Snapdragon 778G, NNAPI had to fall back to CPU due to unsupported LEAKY_RELU—added fallback via NnApiDelegate.Options.setAllowFp16PrecisionForFp32.

App Integration

On Android use org.tensorflow:tensorflow-lite + org.tensorflow:tensorflow-lite-gpu via Gradle. For Task Library support (ImageClassifier, ObjectDetector), add org.tensorflow:tensorflow-lite-task-vision. Task Library handles image preprocessing (resize, normalization), eliminating significant boilerplate.

On iOS, CocoaPods pod 'TensorFlowLiteSwift' or Swift Package Manager (TFLite 2.13+). Wrap inference in an actor for thread-safety:

actor TFLiteInferenceService {
    private let interpreter: Interpreter
    func classify(pixelBuffer: CVPixelBuffer) throws -> [Float] { ... }
}

Load models from bundle or URL with SHA-256 verification. For OTA updates, use Firebase Remote Config with model URL, download via URLSession.downloadTask in background.

Timeline

Integrating a ready TFLite model with delegate selection and basic optimization: 1 week. Full cycle with conversion, quantization, testing on target devices, and OTA updates: 2–3 weeks. Cost calculated individually after model and requirements review.