Gsoc 2021

Google Summer of Code 2021: Final Work Product

Organisation: Center for Research in Open Source Software, UC Santa Cruz

Mentors

Project : Path Finding algorithm using OpenCV and ML

Final Work : Code

Aim

The project aims to detect course markers to help an autonomous vehicle find an optimal trajectory to complete the circuit. The idea is to develop an algorithm that will help a controller, which will first be demonstrated on an autonomous car that will navigate a closed course in minimal time. An embedded device(Raspberry Pi) will be used to take live video input and run the algorithms on it.

Brief Solution

The main focus was to detect the course markers(cones) with the help of deep learning algorithms. For a baseline comparison, the plan was to detect cones using computer vision’s algorithm pipeline.
Then both of these algorithms are deployed on the Raspberry Pi. The algorithm will run on video input. It will find the detected markers’ locations and use that; various information of the markers (distance, angle, etc.) relative to the vehicle will be calculated.

Progress and Work Done

Data Preparation

The initial step was to annotate a lot of data for training the object detection algorithms. I have used a great open-source tool labelImg for this task. Using this tool, I have annotated data for both the TensorFlow models(in Pascal VOC format) and YOLO models(in YOLO format).
For training a YOLO model, generating the .txt files from labelImg is enough.
For training a TensorFlow model, the input format must be in tfrecords. I explained how to generate the tfrecords after annotating the xml files from labelImg in Data-Preparation.

Deep Learning Algorithms

SSD-Mobilenet

I have trained many ssd-mobilenet models present in the TensorFlow 2 Model Zoo. The reason for using ssd-mobilenet models is that they are designed to run on edge devices such as Raspberry-pi among the other tf2 models. I have run the models on videos we created using cones of colours green and blue. I have saved the videos in a folder.
The SSD MobileNet V2 FPNLite 320x320 gives the best detection speed with around 30-35 FPS on the CPU.

YOLOv5

I trained the yolov5 model on the Google Colab using the same dataset with various hyperparameters and image sizes. YOLOv5 is built using pytorch, which is different from ssd-mobilenet, which are built on TensorFlow. YOLOv5 detects the cones more accurately than any other model. It is also significantly easier and faster to train on Google Colab. The detection speed is around 30 FPS on average (A slightly lesser than the ssd-mobilenet model). The rate of detection depends on the size of the image on which the model has been trained. I used two different image sizes, 256 x 320 and 320 x 416. I have saved the result videos in a folder.

Converting the trained weights into TFlite format

Converting the pytorch weights into the tflite format was the challenging part of the project. According to the official yolov5 documentation, we should convert the pytorch weights into ONNX format, then convert ONNX one into CoreML; then finally convert it into the tflite format. I tried it on my local computer but the final tflite model was not performing as expected. So, I have used TensorFlow to convert the model. First, the model’s architecture is feed, then it is converted into the tflite model. I have created a tutorial/instructions on google colab, to convert the model and I have included a way to test the converted weights on images/videos on colab.

YOLOv5 conversion and quantization for TFLite

For running the inference on Coral-edge TPU, simple tflite weights are not enough for best performances. We need quantized tflite weights(i.e., INT8 quantized model). The INT8 model is compressed form of original weights(8-bit quantization approximates floating point values). After creating a quantized weights, the next step is to compile the tflite weights on edgetpu compiler. We can do that on the coraledge easily, but I have included this step in the Colab notebook. We have to install the edgetpu compiler first on colab, then the weights can be compiled easily.

Inferencing on Raspberry-Pi

I have implemented the codes for inferencing the YOLO model on the Raspberry Pi. For inferencing, first pytorch weights have to be converted into the tflite weights using the above method on the Raspberry Pi. The speed of the algorithm on the Raspberry Pi is around 1.5 to 2 FPS. I have implemented the scripts for running inference on the following separately.

  1. Image
  2. Video
  3. Webcam

Future Work

The next goal is to run the algorithm’s inference on the Coral-Edge TPU for better performances. I implemented some parts of the script already, but some minor bugs are present in the codes that need to be fixed.

Acknowledgements

I want to thank my mentors Aaron and Pavlo for their guidance and support throughout these past few weeks. I would also like to thanks CROSS, UC Santa Cruz and Google for giving me this fantastic opportunity to work on this project.

Written on August 18, 2021