Third Eye in your hand!

--

Multi-Class Object Detection on Mobile Video Stream, using Deep Learning ConvNets, to assist the blind or to signal an incoming threat, without radars.

How do robots ‘see’ the world? Yeah, using cameras.

But how do they identify objects? It requires Computer Vision algorithms or more adept Deep Learning techniques applied on camera-video input, so as to differentiate features of the object under surveillance.

This technique can be very useful to assist the blind and the elderly, if deployed on their handy mobile. Objects are detected via smartphone’s camera, identifies them and reports back audibly to the user, thus helping the blind navigate and perform daily tasks with greater ease. Same technique can be used in defense, to locate an enemy aircraft, invading home skies.

Trending AI Articles:

1. Deep Learning Book Notes, Chapter 1

2. Deep Learning Book Notes, Chapter 2

3. Machines Demonstrate Self-Awareness

4. Visual Music & Machine Learning Workshop for Kids

Defence mostly rely on RADAR technology to track and estimate distance, based on the time taken by the signal to bounce back off the target, similar to SONAR and LIDAR. But visual technology is the cheapest of the three and has more ability to classify objects.

Courtesy: Image obtained from http://isee.robots.place/

This blog aims to demonstrate two of the many object detection applications.

a) Assist Blind: Detect household objects (Online Video Stream from mobile)

b) Aid Defense: Locate Jet Fighters, flying the skies (Offline Video Mode)

Thus, this post enable you to detect shapes of your choice in both offline and online mode, i.e. from recorded videos and mobile live camera as well.

The source code of this project can be found in GitHub here.

To detect an object of your choice, we need to follow these steps:

  1. Data Generation: Gather images of similar objects.
  2. Image Annotation: Label the objects with bounding box.
  3. API Installation: Install TensorFlow Object Detection API.
  4. Train & Validate Model: Using annotated images.
  5. Freeze the Model: To enable mobile deployment.
  6. Deploy and Run: In mobile or virtual environment.

Data Generation

Lets assume, the object you want to detect is a flying fighter jet. Find out and save images of flight from https://images.google.com/ and download some videos of jets flying the skies, to make the offline video input.

Use Lossless Cut to cut out relevant portions from video and MP4Joiner to join video-bits without loss. The video thus created becomes the test data. Now extract some frames from created video using Video to JPG Converter. The extracted frames, along with saved images from google, are batch processed by IrfanView to make the filenames consistent and image dimensions similar, which becomes the train data.

Sample Training Data

Image Annotation

By now, we have the train and test images. But the exact location and type of objects in the images has to be explicitly labelled. The bounding boxes can be drawn using Label Box or Label Image software, the output of which are saved as XML files, corresponding to each image.

For multi-class classification, give different label names for different objects in the image. This information is saved in the generated XML files. Add all the categories to label_map.pbtxt in \data folder and modify NUM_CLASSES variable in code, accordingly.

Label Image Software being used to annotate objects in the image

For the purpose of this blog, I have downloaded around 125 random images of chair and took 75 images of chair using my mobile cam. Around 100 images of fighter jets are also download from multiple sources. The whole data-set of 300 images are manually annotated to specify object location and type.

Now we need to convert the generated XML files to a format suitable for training. Download the project from here and use FoodDetection.ipynb to convert the generated XML files to CSV. Generate TFRecord files using code adapted from this raccoon detector to optimize the data feed. The train &test data are separately handled in the code. Modify the train folder name in the TFRecord generator .py file, if you wish to train other data-sets.

TFRecord is TensorFlows binary storage format. It reduces the training time of your model, as binary data takes up less space and disk read more efficient.

ipython notebook FoodDetection.ipynb
python generate_tfrecord.py
mv test.record data
mv train.record data

API Installation

We will use MobileNet model for the neural network architecture and Single Shot Detection to locate the bounding boxes. Mobilenet-SSD architecture is designed to use in mobile applications.

We replace MobileNet instead of VGGNet as Base Network, as they are efficient for mobile & embedded. Base network does feature extraction, SSD does classification & localization i.e. drawing bounding boxes (Courtesy)

To install TensorFlow Object Detection API, download and unzip TensorFlow Models from the repository here and execute the commands below.

cd models/research/
pip install protobuf-compiler
protoc object_detection/protos/*.proto — python_out=.
set PYTHONPATH=<cwd>\models\research;<cwd>\models\research\slim
cd ../../

Train & Validate Model

Download the pre-trained Mobileset SSD model from here and retrain it with your dataset to replace the classes as you desire. Re-training is done to reduce the training time.

Once the environment variable is set, execute the train.py file with a config parameter. Let it train till MAX_STEPS=20,000 or until loss is stabilized.

python train.py -logtostderr -train_dir=data\ -pipeline_config_path=data\ssd_mobilenet_v1_custom.config

Freeze the Model

To serve a model in production, we only need the graph and its weights. We don't need the metadata saved in .meta file, which is mostly useful to retrain the model. TF has a built-in helper function to extract what is needed for inference and create frozen graph_def.

To export the graph for inference, use the latest checkpoint file number stored inside “data” folder. The frozen model file, frozen_inference_graph.pb is generated inside output directory, to be deployed in mobile.

rm -rf object_detection_graph

python models/research/object_detection/export_inference_graph.py -input_type image_tensor -pipeline_config_path data/ssd_mobilenet_v1_custom.config -trained_checkpoint_prefix data/model.ckpt-19945 -output_directory object_detection_graph

Detection on Offline Videos:

To do object detection on recorded video, modify test.py as below and execute

Put the annotated images of chair in train and test directory inside “chairs” directory
Location and probability of chair objects are accurately detected when frames are fed offline

Deploy and Run

Download TensorFlow Android examples from here. Using Android Studio, open the project in this path and follow the steps below.

Update the tensorflow/WORKSPACE file in root directory with the API level and location of the SDK and NDK.

android_sdk_repository (
name = “androidsdk”,
api_level = 23,
build_tools_version = “28.0.3”,
path = “C:\Users\Anand\AppData\Local\Android\Sdk”,
)

android_ndk_repository(
name = “androidndk”,
path = “C:\Users\Anand\AppData\Local\Android\Sdk\ndk-bundle”,
api_level = 19,
)

Set “def nativeBuildSystem” in build.gradle to ‘none’ using Android Studio

Download quantized Mobilenet-SSD TF Lite model from here and unzip mobilenet_ssd.tflite to assets folder: tensorflow/examples/android/assets/

Copy frozen_inference_graph.pb generated in the previous step and label_map.pbtxt in \data folder to the “assets” folder above. Edit the label file to reflect the classes to be identified.

Update the variables TF_OD_API_MODEL_FILE and TF_OD_API_LABELS_FILE in DetectorActivity.java to above filenames with prefix “file:///android_asset/”

Build the bundle as an APK file using Android Studio.

Locate APK file and install on your Android mobile. Execute TF-Detect app to start object detection. The camera would turn on and detect objects real-time.

Object Detection Output: Quick Preview

The source code of this project can be found in GitHub here.

Conclusion

The above system trained on multiple household categories can help navigate visually impaired inside the home, using personal mobile phones. Users can interact with the app via voice and app can talk to user with Google Text to Speech. Once object location and type are detected, distance to the obstacle can be estimated by,

a) Triangle Similarity technique: In constrained environments

b) Stereo Camera or Ultrasonic Sensor: to estimate depth

c) 3D Depth Sensing Camera: on mobile or wearable gadget

Object detection is used in video surveillance, people counting, self driving cars, face detection and also in Defense as demonstrated in the video.

References

Rajalakshmi R, & Vishnupriya K, (2018). Smart Navigation System for the Visually Impaired Using Tensorflow. Retrieved from IJARIIE-ISSN(O)-2395–4396 (Vol-4 Issue-22018)

Don’t forget to give us your 👏 !

--

--

AI / ML R&D Specialist, KLA-Tencor | CSE, IIT B Alumnus | Ex-Vice President — AI & DS, IIA | Ex-TSMC | Reached interview twice for IAS | AI Consultant & Mentor