Deploy a TensorFlow Model to a Mobile or an Embedded Device

Published in

Becoming Human: Artificial Intelligence Magazine

4 min readJul 16, 2020

If you want to deploy your TensorFlow model to a mobile or embedded device, a large model may take too long to download and use too much RAM and CPU, all of which will make your app unresponsive, heat the device and drain its battery. To avoid this, you need to make a mobile-friendly. Lightweight, and efficient model, without sacrificing too much of its accuracy.

Before Deploying a TensorFlow model to a mobile, I suggest you to learn how to Deploy a machine learning model to a Web Application. This will help to to understand things better before getting into to deploy a TensorFlow model to a Mobile or embedded Device.

The file library provides several tools to help you deploy your TensorFlow model to a mobile and embedded devices, with three main objectives:

Reduce the model size to shorten download time and reduce RAM usage.
Reduce the number of computations needed for each prediction to minimize latency, battery usage, and heating.
Adapt the model to device-specific constraints.

Train and Deploy a TensorFlow Model to a Mobile

While you Deploy a Machine Learning Model, you need to reduce the model size, TFLite’s model converter can take a saved model and compress it to a much lighter format based on FlatBuffers. This is a dynamic, cross-platform serialization library initially created by Google without any preprocessing: this reduces the loading time and memory footprint.

How Does it work to Deploy a TensorFlow Model to Mobile

While you Deploy a TensorFlow model to a mobile, the converter optimizes the model, both to shrink it and to reduce its latency. It prunes all the operations that are not needed to make predictions ( such as training operations), and it optimizes computations whenever possible; for example, 3*a + 4*a +5*a will be converted to (3+4+5)*a. It also tries to fuse operations whenever possible.

For example, Batch Normalization layers end up folded into pervious layer’s addition and multiplication operations, whenever possible. To get a good idea of how much TFLite can optimize a model, download one of the pretrained TFLite models, unzip the archive, then open the excellent Netron graph visualization tool and upload the.pb file to view the original model. It’s a big, elaborate graph. Next, open the optimized. Tflite model marvel at its beauty.

Another Way to Reduce the Model Size

Another way you can reduce the model size while you deploy a TensorFlow model to a mobile or embedded device(other than only using smaller neural network architectures) is by using smaller bit-widths: for example, if you use half-floats (16 bits) rather than regular floats (32 bits), the model size will shrink by a factor of 2, at the cost of a ( generally small) accuracy drop. Moreover, training will be faster, and you will use roughly half the amount of GPU RAM.

TFLite’s converter can go further than that, by quantizing the model weights down to fixed- point, 8-bit integers! This leads to a fourfold size reduction compared to using 32-bit floats, 8-bit integers! This leads to a fourfold size reduction compared to using 32-bit floats.

The simplest approach is called post-training quantization: it just quantizes the weights after training, using a fairly basic but efficient symmetrical quantization technique. It finds the maximum absolute weight value, m; then it maps the floating-point range –m to +m to the fixed-point (integer) range-127 to +127.

Don’t forget to give us your 👏 !

Deploy a TensorFlow Model to a Mobile or an Embedded Device

Train and Deploy a TensorFlow Model to a Mobile

Top 4 Most Popular Ai Articles:

How Does it work to Deploy a TensorFlow Model to Mobile

Another Way to Reduce the Model Size

Don’t forget to give us your 👏 !

Written by Aman Kharwal