ML model conversion to TensorFlow Lite format for Android

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.
Development and support of all types of mobile applications:
Information and entertainment mobile applications
News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators
E-commerce mobile applications
Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.
Business process management mobile applications
CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems
Electronic services mobile applications
Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Showing 1 of 1 servicesAll 1735 services
ML model conversion to TensorFlow Lite format for Android
Medium
from 1 business day to 3 business days
FAQ
Our competencies:
Development stages
Latest works
  • image_mobile-applications_feedme_467_0.webp
    Development of a mobile application for FEEDME
    756
  • image_mobile-applications_xoomer_471_0.webp
    Development of a mobile application for XOOMER
    624
  • image_mobile-applications_rhl_428_0.webp
    Development of a mobile application for RHL
    1052
  • image_mobile-applications_zippy_411_0.webp
    Development of a mobile application for ZIPPY
    947
  • image_mobile-applications_affhome_429_0.webp
    Development of a mobile application for Affhome
    862
  • image_mobile-applications_flavors_409_0.webp
    Development of a mobile application for the FLAVORS company
    445

ML Model Conversion to TensorFlow Lite Format for Android

TFLite isn't just weight conversion. It's quantization format selection, graph optimization, choosing operation sets compatible with target Android versions, and verifying numerical result matches original. Each step has specific pitfalls.

Conversion Paths

From TensorFlow SavedModel — direct path. From PyTorch — via ONNX intermediate format. From JAX — via TensorFlow export.

# Path 1: TF SavedModel → TFLite (most reliable)
converter = tf.lite.TFLiteConverter.from_saved_model("saved_model_dir/")
tflite_model = converter.convert()

# Path 2: Keras model → TFLite
converter = tf.lite.TFLiteConverter.from_keras_model(keras_model)
tflite_model = converter.convert()

# Path 3: PyTorch → ONNX → TF → TFLite
import subprocess
subprocess.run(["python", "-m", "tf2onnx.convert",
    "--onnx", "model.onnx",
    "--output", "model_tf",
    "--opset", "17"])
converter = tf.lite.TFLiteConverter.from_saved_model("model_tf/")

ONNX path introduces additional potential incompatibilities — use only when direct path unavailable.

Quantization During Conversion

# FP16 — minimal degradation, 2x smaller model, GPU delegate acceleration
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
tflite_fp16 = converter.convert()

# Dynamic INT8 — weights int8, activations float32. No calibration dataset needed.
converter2 = tf.lite.TFLiteConverter.from_saved_model("saved_model_dir/")
converter2.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_dynamic_int8 = converter2.convert()

# Full INT8 — both weights and activations. Requires calibration dataset. Needed for Hexagon DSP.
def representative_dataset():
    dataset = load_calibration_data()  # 100-500 examples
    for sample in dataset:
        yield [sample[np.newaxis, :].astype(np.float32)]

converter3 = tf.lite.TFLiteConverter.from_saved_model("saved_model_dir/")
converter3.optimizations = [tf.lite.Optimize.DEFAULT]
converter3.representative_dataset = representative_dataset
converter3.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter3.inference_input_type = tf.uint8   # or tf.int8
converter3.inference_output_type = tf.uint8
tflite_full_int8 = converter3.convert()

Full INT8 with inference_input_type = tf.uint8 — input data passed as uint8 (0–255), no normalization to float32 in Java/Kotlin. Removes preprocessing step, but requires careful quantization parameter alignment.

Unsupported Operations

Not all TF/PyTorch ops exist in TFLite builtin ops. Check:

# Which operations in model are unsupported TFLite
converter.target_spec.supported_ops = [
    tf.lite.OpsSet.TFLITE_BUILTINS,
    tf.lite.OpsSet.SELECT_TF_OPS  # fallback to TF operations
]

SELECT_TF_OPS includes TF operation subset — increases TFLite runtime binary (~5 MB) and slows some ops. Better rewrite model avoiding SELECT_TF_OPS — enables NNAPI and Hexagon compatibility.

Custom operation via C++:

// Custom operator registration
static TfLiteRegistration* GetMyCustomOpRegistration() {
    static TfLiteRegistration reg = {
        nullptr, nullptr,
        [](TfLiteContext* ctx, TfLiteNode* node) -> TfLiteStatus {
            // Inference implementation
            return kTfLiteOk;
        },
        nullptr, "MyCustomOp", 1
    };
    return ®
}

// In Android NDK code:
interpreter.AddCustomOp("MyCustomOp", GetMyCustomOpRegistration, 1);

Kotlin via JNI. Non-trivial, but sometimes only path.

Numerical Accuracy Verification

# Compare PyTorch/TF and TFLite outputs on identical inputs
import numpy as np

# TF original
tf_output = tf_model(test_input).numpy()

# TFLite
interpreter = tf.lite.Interpreter(model_content=tflite_model)
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

interpreter.set_tensor(input_details[0]['index'], test_input)
interpreter.invoke()
tflite_output = interpreter.get_tensor(output_details[0]['index'])

print(f"Max abs diff: {np.max(np.abs(tf_output - tflite_output))}")
print(f"MSE: {np.mean((tf_output - tflite_output)**2)}")
# FP32: < 1e-5, FP16: < 1e-2, INT8: < 0.05

If difference exceeds norm — issue in input normalization, incorrect quantization parameters, or operation using different algorithm in TFLite.

Object Detection Specifics

YOLO, SSD, EfficientDet contain NMS (Non-Maximum Suppression) postprocessing. TFLite lacks built-in NMS (unlike Core ML Detection Output). Options:

  1. Remove NMS from model, implement in Java/Kotlin post-inference
  2. Use TFLite Task Library — contains ready ObjectDetection API with NMS
// TFLite Task Library: ObjectDetector (includes NMS)
val options = ObjectDetector.ObjectDetectorOptions.builder()
    .setScoreThreshold(0.5f)
    .setMaxResults(20)
    .build()

val detector = ObjectDetector.createFromFileAndOptions(context, "detector.tflite", options)

val image = TensorImage.fromBitmap(inputBitmap)
val results: List<Detection> = detector.detect(image)

for (detection in results) {
    val box = detection.boundingBox  // RectF
    val label = detection.categories.first().label
    val score = detection.categories.first().score
}

Task Library supports only specific model signatures — model must follow TFLite Model Metadata format.

Model Metadata: Important for Interpreter + Task Library

from tflite_support.metadata_writers import image_classifier
from tflite_support.metadata_writers import writer_utils

# Create classifier metadata
writer = image_classifier.MetadataWriter.create_for_inference(
    writer_utils.load_file("model.tflite"),
    input_norm_mean=[0.0],
    input_norm_std=[255.0],
    labels_file_paths=["labels.txt"]
)

tflite_with_metadata = writer.populate()
writer_utils.save_file(tflite_with_metadata, "model_with_metadata.tflite")

Without metadata, TFLite Task Library performs worse — no auto normalization, no output mapping. With metadata — handled automatically.

Benchmark on Devices

# ADB: run TFLite Benchmark Tool directly on device
adb push model.tflite /data/local/tmp/
adb shell /data/local/tmp/benchmark_model \
    --graph=/data/local/tmp/model.tflite \
    --use_gpu=true \
    --num_threads=4 \
    --num_runs=50
# Output: average latency, min/max, warmup time

TFLite Benchmark Tool — official Google tool, provides honest numbers without JVM overhead and Android UI. Use for delegate comparison (GPU vs NNAPI vs CPU).

Process

Conversion path selection → conversion with needed quantization level → accuracy verification → metadata addition → testing on device range via Benchmark Tool → Android app integration.

Timeline Estimates

Direct TF/Keras model conversion with verification — 3–7 days. Conversion via ONNX, custom operations, metadata addition, full testing — 2–4 weeks.