ML model conversion to TensorFlow Lite format for Android

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Development and support of all types of mobile applications:

Information and entertainment mobile applications

News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators

E-commerce mobile applications

Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.

Business process management mobile applications

CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems

Electronic services mobile applications

Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Offered services

Showing 1 of 1 servicesAll 1735 services

ML model conversion to TensorFlow Lite format for Android

Medium

from 1 business day to 3 business days

FAQ

Our competencies:

Free consultation

Book a free consultation if you have any questions. A dedicated specialist will advise you.

Cost calculation

If you know what exactly you need to develop, or you already have a ready-made technical task.

Development stages

Latest works

Development of a mobile application for FEEDME
761
Development of a mobile application for XOOMER
649
Development of a mobile application for RHL
1071
Development of a mobile application for ZIPPY
947
Development of a mobile application for Affhome
884
Development of a mobile application for the FLAVORS company
466

Show more works

ML Model Conversion to TensorFlow Lite Format for Android

TFLite isn't just weight conversion. It's quantization format selection, graph optimization, choosing operation sets compatible with target Android versions, and verifying numerical result matches original. Each step has specific pitfalls.

Conversion Paths

From TensorFlow SavedModel — direct path. From PyTorch — via ONNX intermediate format. From JAX — via TensorFlow export.

# Path 1: TF SavedModel → TFLite (most reliable)
converter = tf.lite.TFLiteConverter.from_saved_model("saved_model_dir/")
tflite_model = converter.convert()

# Path 2: Keras model → TFLite
converter = tf.lite.TFLiteConverter.from_keras_model(keras_model)
tflite_model = converter.convert()

# Path 3: PyTorch → ONNX → TF → TFLite
import subprocess
subprocess.run(["python", "-m", "tf2onnx.convert",
    "--onnx", "model.onnx",
    "--output", "model_tf",
    "--opset", "17"])
converter = tf.lite.TFLiteConverter.from_saved_model("model_tf/")

ONNX path introduces additional potential incompatibilities — use only when direct path unavailable.

Quantization During Conversion

# FP16 — minimal degradation, 2x smaller model, GPU delegate acceleration
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
tflite_fp16 = converter.convert()

# Dynamic INT8 — weights int8, activations float32. No calibration dataset needed.
converter2 = tf.lite.TFLiteConverter.from_saved_model("saved_model_dir/")
converter2.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_dynamic_int8 = converter2.convert()

# Full INT8 — both weights and activations. Requires calibration dataset. Needed for Hexagon DSP.
def representative_dataset():
    dataset = load_calibration_data()  # 100-500 examples
    for sample in dataset:
        yield [sample[np.newaxis, :].astype(np.float32)]

converter3 = tf.lite.TFLiteConverter.from_saved_model("saved_model_dir/")
converter3.optimizations = [tf.lite.Optimize.DEFAULT]
converter3.representative_dataset = representative_dataset
converter3.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter3.inference_input_type = tf.uint8   # or tf.int8
converter3.inference_output_type = tf.uint8
tflite_full_int8 = converter3.convert()

Full INT8 with inference_input_type = tf.uint8 — input data passed as uint8 (0–255), no normalization to float32 in Java/Kotlin. Removes preprocessing step, but requires careful quantization parameter alignment.

Unsupported Operations

Not all TF/PyTorch ops exist in TFLite builtin ops. Check:

# Which operations in model are unsupported TFLite
converter.target_spec.supported_ops = [
    tf.lite.OpsSet.TFLITE_BUILTINS,
    tf.lite.OpsSet.SELECT_TF_OPS  # fallback to TF operations
]

SELECT_TF_OPS includes TF operation subset — increases TFLite runtime binary (~5 MB) and slows some ops. Better rewrite model avoiding SELECT_TF_OPS — enables NNAPI and Hexagon compatibility.

Custom operation via C++:

// Custom operator registration
static TfLiteRegistration* GetMyCustomOpRegistration() {
    static TfLiteRegistration reg = {
        nullptr, nullptr,
        [](TfLiteContext* ctx, TfLiteNode* node) -> TfLiteStatus {
            // Inference implementation
            return kTfLiteOk;
        },
        nullptr, "MyCustomOp", 1
    };
    return &reg;
}

// In Android NDK code:
interpreter.AddCustomOp("MyCustomOp", GetMyCustomOpRegistration, 1);

Kotlin via JNI. Non-trivial, but sometimes only path.

Numerical Accuracy Verification

# Compare PyTorch/TF and TFLite outputs on identical inputs
import numpy as np

# TF original
tf_output = tf_model(test_input).numpy()

# TFLite
interpreter = tf.lite.Interpreter(model_content=tflite_model)
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

interpreter.set_tensor(input_details[0]['index'], test_input)
interpreter.invoke()
tflite_output = interpreter.get_tensor(output_details[0]['index'])

print(f"Max abs diff: {np.max(np.abs(tf_output - tflite_output))}")
print(f"MSE: {np.mean((tf_output - tflite_output)**2)}")
# FP32: < 1e-5, FP16: < 1e-2, INT8: < 0.05

If difference exceeds norm — issue in input normalization, incorrect quantization parameters, or operation using different algorithm in TFLite.

Object Detection Specifics

YOLO, SSD, EfficientDet contain NMS (Non-Maximum Suppression) postprocessing. TFLite lacks built-in NMS (unlike Core ML Detection Output). Options:

Remove NMS from model, implement in Java/Kotlin post-inference
Use TFLite Task Library — contains ready ObjectDetection API with NMS

// TFLite Task Library: ObjectDetector (includes NMS)
val options = ObjectDetector.ObjectDetectorOptions.builder()
    .setScoreThreshold(0.5f)
    .setMaxResults(20)
    .build()

val detector = ObjectDetector.createFromFileAndOptions(context, "detector.tflite", options)

val image = TensorImage.fromBitmap(inputBitmap)
val results: List<Detection> = detector.detect(image)

for (detection in results) {
    val box = detection.boundingBox  // RectF
    val label = detection.categories.first().label
    val score = detection.categories.first().score
}

Task Library supports only specific model signatures — model must follow TFLite Model Metadata format.

Model Metadata: Important for Interpreter + Task Library

from tflite_support.metadata_writers import image_classifier
from tflite_support.metadata_writers import writer_utils

# Create classifier metadata
writer = image_classifier.MetadataWriter.create_for_inference(
    writer_utils.load_file("model.tflite"),
    input_norm_mean=[0.0],
    input_norm_std=[255.0],
    labels_file_paths=["labels.txt"]
)

tflite_with_metadata = writer.populate()
writer_utils.save_file(tflite_with_metadata, "model_with_metadata.tflite")

Without metadata, TFLite Task Library performs worse — no auto normalization, no output mapping. With metadata — handled automatically.

Benchmark on Devices

# ADB: run TFLite Benchmark Tool directly on device
adb push model.tflite /data/local/tmp/
adb shell /data/local/tmp/benchmark_model \
    --graph=/data/local/tmp/model.tflite \
    --use_gpu=true \
    --num_threads=4 \
    --num_runs=50
# Output: average latency, min/max, warmup time

TFLite Benchmark Tool — official Google tool, provides honest numbers without JVM overhead and Android UI. Use for delegate comparison (GPU vs NNAPI vs CPU).

Process

Conversion path selection → conversion with needed quantization level → accuracy verification → metadata addition → testing on device range via Benchmark Tool → Android app integration.

Timeline Estimates

Direct TF/Keras model conversion with verification — 3–7 days. Conversion via ONNX, custom operations, metadata addition, full testing — 2–4 weeks.