On-device ML model TensorFlow Lite for offline AI in Android app

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Development and support of all types of mobile applications:

Information and entertainment mobile applications

News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators

E-commerce mobile applications

Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.

Business process management mobile applications

CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems

Electronic services mobile applications

Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Offered services

Showing 1 of 1 servicesAll 1735 services

On-device ML model TensorFlow Lite for offline AI in Android app

Complex

~1-2 weeks

FAQ

Our competencies:

Free consultation

Book a free consultation if you have any questions. A dedicated specialist will advise you.

Cost calculation

If you know what exactly you need to develop, or you already have a ready-made technical task.

Development stages

Latest works

Development of a mobile application for FEEDME
761
Development of a mobile application for XOOMER
649
Development of a mobile application for RHL
1071
Development of a mobile application for ZIPPY
947
Development of a mobile application for Affhome
884
Development of a mobile application for the FLAVORS company
466

Show more works

On-Device ML Model Integration (TensorFlow Lite) for Offline AI in Android Apps

TensorFlow Lite is the de-facto standard for running ML models on Android. But "add tflite file to assets" is just the beginning. Real integration includes choosing acceleration delegate, managing buffer memory, handling device incompatibilities, and testing numerical accuracy.

Converting Model to TFLite

From PyTorch via ONNX:

# PyTorch → ONNX
python -c "
import torch, onnx
model = MyModel(); model.eval()
torch.onnx.export(model, torch.zeros(1,3,224,224), 'model.onnx',
    opset_version=17, input_names=['input'], output_names=['output'])
"

# ONNX → TFLite via onnx-tf
pip install onnx-tf tensorflow
onnx-tf convert -i model.onnx -o model_tf
tflite_convert --saved_model_dir=model_tf --output_file=model.tflite

Or directly from TensorFlow SavedModel:

converter = tf.lite.TFLiteConverter.from_saved_model("model_tf")
converter.optimizations = [tf.lite.Optimize.DEFAULT]  # dynamic FP16 quantization
converter.target_spec.supported_types = [tf.float16]  # for GPU delegate
tflite_model = converter.convert()
with open("model_fp16.tflite", "wb") as f:
    f.write(tflite_model)

Acceleration Delegates: Which to Choose

Delegate	Requirements	Speedup vs CPU	Constraints
GPU Delegate	OpenGL ES 3.1 / Vulkan	3–7×	Not all ops, FP32/FP16
NNAPI	Android 8.1+, NPU/DSP	2–10×	Chip-dependent, unstable on old ROM
Hexagon (QC)	Snapdragon with DSP	3–8×	Qualcomm only
CPU (XNNPACK)	Always	baseline	—

// GPU Delegate—most universal
import org.tensorflow.lite.gpu.GpuDelegate
import org.tensorflow.lite.gpu.CompatibilityList

val compatList = CompatibilityList()
val options = Interpreter.Options()

if (compatList.isDelegateSupportedOnThisDevice) {
    val delegateOptions = compatList.bestOptionsForThisDevice
    options.addDelegate(GpuDelegate(delegateOptions))
} else {
    // Fallback: NNAPI or CPU with XNNPACK
    options.setUseNNAPI(true)
    options.setUseXNNPACK(true)
}
options.setNumThreads(4)

val interpreter = Interpreter(
    FileUtil.loadMappedFile(context, "model_fp16.tflite"),
    options
)

NNAPI in practice unstable: on some devices gives 5× speedup, on others crashes with NNAPIDelegate: Failed to invoke the model due to incompatible operations. Must have try/catch with CPU fallback:

try {
    options.setUseNNAPI(true)
    interpreter = Interpreter(modelBuffer, options)
    // Test run to verify
    interpreter.run(testInput, testOutput)
} catch (e: Exception) {
    Log.w("ML", "NNAPI failed, falling back to CPU: ${e.message}")
    options.setUseNNAPI(false)
    interpreter = Interpreter(modelBuffer, options)
}

Buffer Management: ByteBuffer vs TensorBuffer

Direct ByteBuffer management faster but verbose. TensorBuffer from org.tensorflow.lite.support more convenient:

// Via TFLite Support Library (recommended)
val imageProcessor = ImageProcessor.Builder()
    .add(ResizeOp(224, 224, ResizeOp.ResizeMethod.BILINEAR))
    .add(NormalizeOp(127.5f, 127.5f))  // normalization [-1, 1]
    .build()

val tensorImage = TensorImage(DataType.FLOAT32)
tensorImage.load(bitmap)
val processedImage = imageProcessor.process(tensorImage)

// Run
val outputBuffer = TensorBuffer.createFixedSize(intArrayOf(1, 1000), DataType.FLOAT32)
interpreter.run(processedImage.buffer, outputBuffer.buffer)

// Result
val probabilities = outputBuffer.floatArray
val topIndex = probabilities.indices.maxByOrNull { probabilities[it] } ?: -1

ResizeOp on CPU surprisingly slow for large images (Full HD → 224×224 takes 20–40 ms). Alternative: pre-resize via Bitmap.createScaledBitmap() or via RenderScript (deprecated) / Camera2 output size.

CameraX Integration

val imageAnalyzer = ImageAnalysis.Builder()
    .setTargetResolution(Size(640, 480))
    .setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST)  // don't queue
    .build()
    .also {
        it.setAnalyzer(cameraExecutor) { imageProxy ->
            val bitmap = imageProxy.toBitmap()
            runInference(bitmap)
            imageProxy.close()  // CRITICAL: else CameraX hangs
        }
    }

imageProxy.close() in finally block—not optional. If not closed, ImageAnalysis stops delivering frames after seconds. Typical bug discovered only in long testing.

Numerical Accuracy After Conversion

After conversion and quantization, always check model accuracy on test set. FP16 quantization usually loses <1% accuracy, INT8—1–3%. If losses larger—possibly quantization calibration dataset too small or model sensitive to certain layers.

To verify—compare outputs of original PyTorch model and TFLite on same inputs:

# Test output match
import numpy as np
original_out = pytorch_model(test_input).detach().numpy()
tflite_out = run_tflite(interpreter, test_input)
print(f"Max difference: {np.max(np.abs(original_out - tflite_out))}")
# Normal: < 0.01 for FP16, < 0.05 for INT8

Model Placement

.tflite file in assets/. First run copy to filesDir or use MappedByteBuffer directly from assets for zero-copy loading:

fun loadModelFile(context: Context, filename: String): MappedByteBuffer {
    val fileDescriptor = context.assets.openFd(filename)
    val inputStream = FileInputStream(fileDescriptor.fileDescriptor)
    return inputStream.channel.map(
        FileChannel.MapMode.READ_ONLY,
        fileDescriptor.startOffset,
        fileDescriptor.declaredLength
    )
}

MappedByteBuffer—OS doesn't copy file to RAM on load, but maps directly. For large models (50–200 MB) significant.

Process

Convert from source format → evaluate delegates on target devices → integrate with fallback logic → test numerical accuracy → profile via Android Profiler + TFLite Model Benchmark Tool.

Timeline Estimates

Basic TFLite model integration on Android takes 1–2 weeks. With multi-delegate logic, CameraX pipeline, testing on device fleet requires 3–5 weeks.