AI & ML in Mobile Apps: CoreML, TFLite & LLM

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Development and support of all types of mobile applications:

Information and entertainment mobile applications

News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators

E-commerce mobile applications

Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.

Business process management mobile applications

CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems

Electronic services mobile applications

Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Offered services

Showing 60 of 237 servicesAll 1735 services

Mobile App AI Image Generation Implementation

Medium

~1-2 weeks

Mobile App AI Content Personalization Implementation

Complex

~1-2 weeks

Mobile App Predictive Text Input Implementation

Complex

~1-2 weeks

Mobile App Smart Reply Implementation

Medium

~1-2 weeks

Mobile App Natural Language Processing Implementation

Complex

~1-2 weeks

Document Verification Implementation in Mobile App

Complex

~3-5 business days

Mobile Telegram Bot with Crypto Trading

Complex

from 1 week to 3 months

Crypto News Aggregator Mobile App Development

Medium

from 1 week to 3 months

Mobile Chatbot Development for Telegram

Simple

from 4 hours to 2 business days

Mobile Chatbot Development for WhatsApp

Medium

from 4 hours to 2 business days

Mobile Chatbot Development for Viber

Simple

from 4 hours to 2 business days

Mobile Chatbot Development for Facebook Messenger

Simple

from 4 hours to 2 business days

Mobile Chatbot Development for VKontakte

Simple

from 4 hours to 2 business days

Mobile Chatbot Development for Instagram Direct

Medium

from 4 hours to 2 business days

Mobile Chatbot Development for Odnoklassniki

Simple

from 4 hours to 2 business days

Dialogflow NLP Engine Integration in Mobile Chatbot

Medium

~3-5 business days

Rasa NLP Engine Integration in Mobile Chatbot

Complex

~3-5 business days

Microsoft Bot Framework NLP Integration in Mobile Chatbot

Medium

~3-5 business days

LLM ChatGPT Claude Integration in Mobile Chatbot

Medium

~3-5 business days

Order Processing Bot Assistant in Mobile App

Medium

~3-5 business days

Appointment Booking Bot Assistant in Mobile App

Medium

~3-5 business days

Tech Support Bot Assistant in Mobile App

Medium

~3-5 business days

FAQ Bot Assistant in Mobile App

Simple

~2-3 business days

Table and Room Booking Bot in Mobile App

Medium

~3-5 business days

Hotel Concierge Bot in Mobile App

Medium

~3-5 business days

HR Bot Assistant for Leave and Requests in Mobile App

Medium

~3-5 business days

Internal Team Communication Bot in Mobile App

Simple

~2-3 business days

CRM Process Automation Bot in Mobile App

Medium

~3-5 business days

Voice Bot Implementation in Mobile App

Complex

from 1 week to 3 months

Image Recognition Bot in Mobile App

Medium

~3-5 business days

FAQ

Our competencies:

Free consultation

Book a free consultation if you have any questions. A dedicated specialist will advise you.

Cost calculation

If you know what exactly you need to develop, or you already have a ready-made technical task.

Development stages

Latest works

Development of a mobile application for FEEDME
761
Development of a mobile application for XOOMER
649
Development of a mobile application for RHL
1071
Development of a mobile application for ZIPPY
947
Development of a mobile application for Affhome
884
Development of a mobile application for the FLAVORS company
466

Show more works

AI and ML in Mobile Applications: CoreML, TFLite and On-Device Models

The difference between "an application with AI" and "an application that calls OpenAI" is fundamental. The first works without internet, doesn't send user data to third-party servers, and responds in 50 milliseconds. The second depends on network latency and subscription plans. The right choice is determined at the architecture stage.

On-Device Inference: When and How

CoreML is Apple's native framework for running ML models on the device. Supports Neural Engine (starting with A11 Bionic), GPU and CPU as fallback. Models are converted to .mlmodel format via coremltools from PyTorch, ONNX or TensorFlow. Conversion is not always trivial: custom layers require writing MLCustomLayer, and INT8 quantization sometimes noticeably reduces accuracy on specific data.

TensorFlow Lite is a cross-platform alternative for Android and Flutter. On Android it uses NNAPI (Neural Networks API) as hardware acceleration — with Android 10 NNAPI is more stable, before that it's better to explicitly use GPU delegate via GpuDelegate. Typical mistake: the model was trained on normalized data in the [0,1] range, but in the application the input is [0,255] — inference works but produces meaningless results without an error.

For image classification, object detection, and segmentation tasks, there are ready-optimized models. YOLOv8 in CoreML format runs 640×640 frame detection in 15-20ms on iPhone 14 Neural Engine. MobileNetV3 on TFLite with GPU delegate — about 8ms on Pixel 7 for classification.

On-Device LLM: phi-3, Gemma and What Comes From It

Running small language models on the device became a reality in 2024. Apple Intelligence uses its own models through Private Cloud Compute, but other paths are available to third-party developers.

llama.cpp with Metal backend on iOS is a working approach for phi-3-mini (3.8B parameters, 4-bit quantization, ~2.3GB). Inference: 15-25 tokens/second on iPhone 15 Pro. For Swift integration, use the llama.swift Swift Package or a wrapper through the C interface llama.h. We don't package the binary with the application — the model is downloaded on first launch and stored in Application Support.

On Android, the equivalent is Google AI Edge (formerly MediaPipe LLM Inference API) with Gemma-2B support. Works via GPU delegate, on Tensor G3 chip of Pixel 8 Pro — about 20 tokens/second.

Limitations are real: models larger than 4B parameters on mobile devices in 2024-2025 are slow. For complex reasoning tasks, on-device LLM loses to GPT-4o in quality. Hybrid approach — on-device for short tasks and private data, cloud for complex queries — is often optimal.

OpenAI API Integration and Other Cloud Models

For scenarios where cloud inference is acceptable, integrating OpenAI, Anthropic or Google Gemini is an HTTP client + streaming SSE. In Swift, convenient via AsyncThrowingStream for streaming responses. In Kotlin — via Flow.

Critical: API keys are never stored in the application bundle. Even obfuscated keys are extracted from IPA in 10 minutes via strings or frida. Correct architecture: mobile application → own backend → OpenAI API. The backend controls rate limiting, logs requests, protects the key.

Typical Project Pipeline

We start with choosing the inference architecture: latency, privacy, model size, target devices. Prototype the model in Python, evaluate accuracy on target data, then convert and test on the device — here it often becomes clear that the mobile version requires additional distillation or quantization.

Integration into the application: the model is wrapped in a service layer that hides framework details. This allows changing CoreML to TFLite or on-device to cloud without rewriting business logic.

Timeline: integrating a ready CoreML/TFLite model into an existing application — 1-2 weeks. Developing a custom model for a task with mobile optimization — from 6 weeks. On-device LLM chat with personalization — 4-8 weeks.