AI & ML in Mobile Apps: CoreML, TFLite & LLM

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.
Development and support of all types of mobile applications:
Information and entertainment mobile applications
News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators
E-commerce mobile applications
Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.
Business process management mobile applications
CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems
Electronic services mobile applications
Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Showing 60 of 237 servicesAll 1735 services
Complex
from 1 week to 3 months
Simple
from 4 hours to 2 business days
Medium
from 4 hours to 2 business days
Simple
from 4 hours to 2 business days
Simple
from 4 hours to 2 business days
Simple
from 4 hours to 2 business days
Medium
from 4 hours to 2 business days
Simple
from 4 hours to 2 business days
Simple
~2-3 business days
Medium
~3-5 business days
Complex
from 1 week to 3 months
Medium
~3-5 business days
FAQ
Our competencies:
Development stages
Latest works
  • image_mobile-applications_feedme_467_0.webp
    Development of a mobile application for FEEDME
    756
  • image_mobile-applications_xoomer_471_0.webp
    Development of a mobile application for XOOMER
    624
  • image_mobile-applications_rhl_428_0.webp
    Development of a mobile application for RHL
    1052
  • image_mobile-applications_zippy_411_0.webp
    Development of a mobile application for ZIPPY
    947
  • image_mobile-applications_affhome_429_0.webp
    Development of a mobile application for Affhome
    862
  • image_mobile-applications_flavors_409_0.webp
    Development of a mobile application for the FLAVORS company
    445

AI and ML in Mobile Applications: CoreML, TFLite and On-Device Models

The difference between "an application with AI" and "an application that calls OpenAI" is fundamental. The first works without internet, doesn't send user data to third-party servers, and responds in 50 milliseconds. The second depends on network latency and subscription plans. The right choice is determined at the architecture stage.

On-Device Inference: When and How

CoreML is Apple's native framework for running ML models on the device. Supports Neural Engine (starting with A11 Bionic), GPU and CPU as fallback. Models are converted to .mlmodel format via coremltools from PyTorch, ONNX or TensorFlow. Conversion is not always trivial: custom layers require writing MLCustomLayer, and INT8 quantization sometimes noticeably reduces accuracy on specific data.

TensorFlow Lite is a cross-platform alternative for Android and Flutter. On Android it uses NNAPI (Neural Networks API) as hardware acceleration — with Android 10 NNAPI is more stable, before that it's better to explicitly use GPU delegate via GpuDelegate. Typical mistake: the model was trained on normalized data in the [0,1] range, but in the application the input is [0,255] — inference works but produces meaningless results without an error.

For image classification, object detection, and segmentation tasks, there are ready-optimized models. YOLOv8 in CoreML format runs 640×640 frame detection in 15-20ms on iPhone 14 Neural Engine. MobileNetV3 on TFLite with GPU delegate — about 8ms on Pixel 7 for classification.

On-Device LLM: phi-3, Gemma and What Comes From It

Running small language models on the device became a reality in 2024. Apple Intelligence uses its own models through Private Cloud Compute, but other paths are available to third-party developers.

llama.cpp with Metal backend on iOS is a working approach for phi-3-mini (3.8B parameters, 4-bit quantization, ~2.3GB). Inference: 15-25 tokens/second on iPhone 15 Pro. For Swift integration, use the llama.swift Swift Package or a wrapper through the C interface llama.h. We don't package the binary with the application — the model is downloaded on first launch and stored in Application Support.

On Android, the equivalent is Google AI Edge (formerly MediaPipe LLM Inference API) with Gemma-2B support. Works via GPU delegate, on Tensor G3 chip of Pixel 8 Pro — about 20 tokens/second.

Limitations are real: models larger than 4B parameters on mobile devices in 2024-2025 are slow. For complex reasoning tasks, on-device LLM loses to GPT-4o in quality. Hybrid approach — on-device for short tasks and private data, cloud for complex queries — is often optimal.

OpenAI API Integration and Other Cloud Models

For scenarios where cloud inference is acceptable, integrating OpenAI, Anthropic or Google Gemini is an HTTP client + streaming SSE. In Swift, convenient via AsyncThrowingStream for streaming responses. In Kotlin — via Flow.

Critical: API keys are never stored in the application bundle. Even obfuscated keys are extracted from IPA in 10 minutes via strings or frida. Correct architecture: mobile application → own backend → OpenAI API. The backend controls rate limiting, logs requests, protects the key.

Typical Project Pipeline

We start with choosing the inference architecture: latency, privacy, model size, target devices. Prototype the model in Python, evaluate accuracy on target data, then convert and test on the device — here it often becomes clear that the mobile version requires additional distillation or quantization.

Integration into the application: the model is wrapped in a service layer that hides framework details. This allows changing CoreML to TFLite or on-device to cloud without rewriting business logic.

Timeline: integrating a ready CoreML/TFLite model into an existing application — 1-2 weeks. Developing a custom model for a task with mobile optimization — from 6 weeks. On-device LLM chat with personalization — 4-8 weeks.