Computer Vision applications in Self-Driving Cars

Published in

Becoming Human: Artificial Intelligence Magazine

10 min readNov 25, 2019

An updated version of this article has been published on https://www.thinkautonomous.ai/blog/?p=computer-vision-applications-in-self-driving-cars

This article is a refresh from my previous article AI…And the Vehicle went Autonomous where I described approaches to find lane lines and obstacles on the road.
I would like to use my recent experience in the field to add to what I previously wrote about AI in self-driving vehicles and also discuss other, more advanced, applications of computer vision in autonomous vehicles.

Before we start, I invite you to join the Think Autonomous Mailing List and learn every day about self-driving cars, Computer Vision, and Artificial Intelligence.

AI…And the vehicle went Autonomous

I wrote this first article when I was learning self-driving cars with Udacity as part of their nanodegree program.

In this article, I mentioned 3 major Perception problems to solve using Computer Vision.

Lane Line Detection
Obstacle & Road Signs/Lights Detection
Steering Angle Computation

For these problems, I respectively used traditional Computer Vision, Machine Learning and Deep Learning.

Computer Vision for lane lines detection

Traditional Computer Vision was used to find lane lines on the road, even curvy ones. The discipline is all about using OpenCV library and studying the pixels colors to get an appropriate shape.

Machine Learning was used as a classifier with a sliding window to find cars and obstacles. The SVM classifier often had two classes: car and not car.
The steering angle was determined through imitation learning and the end to end Deep Learning approach.

Since then, I have worked for over a year in an autonomous shuttle startup on Computer Vision and got the opportunity to try many things I learned.

I first had the opportunity to try traditional computer vision on lane lines detection and realized the algorithms were very slow and not really robust. Robustness can be rectified using a Kalman Filter, but performance is another thing.

Machine Learning turned out to have bad results for obstacle detection, especially when detecting more than one class. The sliding window technique is also way too slow compared to recent algorithms.

Finally, end-to-end Deep Learning can be an amazing idea but is not good enough yet. Reinforcement Learning also seems more promising but still in experimental research.

Modern Approaches

Computer Vision, Machine Learning, and Deep Learning are generally good solutions for Perception problems.

Lately, Deep Learning using Convolutional Neural Networks outperformed every other technique for lane line and obstacle detection; so much that it isn’t even worth to try something else.

My former article mentioned Convolutional Neural Networks (CNNs) as being the state of the art way to solve Computer Vision problems.

CNNs are a specific architecture of Neural Networks that can learn specific shapes (like cars, pedestrians, …) using Convolutions. It uses more data than Machine Learning algorithms, and training is harder, but results are much better.

Deep Learning architectures today can be set for specific purposes such as bounding box detection or lane line coefficient regression. Unlike with the “classifier approach”, there is no need to use a sliding window or a histogram to output the desired result. Neural networks can be set to output whatever we want.

Vehicle Detection

When detecting obstacles, the result is generally more than simply outputting car or not car. We need bounding box coordinates (x1, y1, x2, y2) that we previously had with the sliding window. We need score confidence (to threshold low confidence values) and the class (car, pedestrian, …) that the SVM algorithm got.

In Deep Learning, we can simply have an architecture outputting one neuron per desired number at the end of the neural network.

The convolutional layers are here for autonomous feature learning (size, color, shape) while the last layers are here for the output. They learn to generate the bounding box coordinates and the other relevant features we need. All is done in a single neural network.

CNN architecture purposed for obstacle detection

In the traditional Machine Learning way, there would be a single output neuron taking a 0 or 1 value depending on whether there is a car or not in a specific window. Learning to output bounding box directly allows for much faster computation and more precise results. Algorithms like YOLO are today considered state of the art, they can perform at really high frequency (over 50 FPS) and almost never miss obstacles.

Lane Lines Detection

For lane lines detection, Deep Learning can be used in the exact same way.

The role is to generate the lane line equation coefficients. Lane lines can be approximated with first, second, or third-order coefficients equations. First-order equation would simply be ax+b (a straight line) while higher-dimensional ones will allow for curves.

In a CNN, the convolutional layers learn features, while the last layers learn lane line coefficients (a, b and c).

This may seem simple: set a few convolutional layers, set a few dense layers, and set the output architecture to have only 3 neurons for a, b and c coefficients.

In reality, this way is harder. Datasets are not always mentioning lane lines coefficients, and we might also want to detect the type of line (dashed, solid, …) as well as whether the line belongs to the ego vehicle lane or to an adjacent one. There are multiple features we may want to have and a single neural network may be really hard to train and harder to generalize.

A popular approach for solving this problem is using segmentation. In segmentation, the goal is to give a class to each pixel of an image.

In this approach, each lane corresponds to a class (ego left, ego right, …) and the goal of the neural network is to generate an image with these colors only.

In this type of architecture, the neural network is working in two parts. The first part learns the features, the second part learns the output. Just like for bounding box detection.

U-Net architecture for lane lines detection

If the output is a simple black image with colors, it’s then very simple to use Machine Learning and a linear regression (or multiple) to find the lane lines on the same color points.

These approach generally outperform the traditional ones and can be 10 times faster. In my tests, I had 5 FPS for the Computer Vision approach and about 50 FPS for the Deep Learning approach.

Other Uses for Deep Learning in Self-Driving Cars — Tracking

I recently released an article that got a lot of attention: Computer Vision for Tracking.

In this article, I mention a technique to track obstacles through time using a camera, Deep Learning, and Artificial Intelligence algorithms such as Kalman Filters and the Hungarian Algorithm.

Tracking through time using Computer Vision

Here, bounding boxes do not change colors from frame 1 to frame 2 as it would do in a classic YOLO approach. The car on the right has a black bounding box on frame 1 and on frame 2 because of the association. Same color objects don’t have the same color boxes either.

It can be very difficult for neural networks to learn to get this result. That is why we use bayesian filtering and an association algorithm. Have a better understanding here.

In this approach, Deep Learning is used for bounding box detection, and the result is immediately passed to the other algorithms that decide if the vehicle is the same as the previous one or not. To decide, convolutional features can also be used as matching depends on what the object looks like as well.

Using time this way can then allow for tracking and behavioral prediction.

Other Uses for Deep Learning in Self-Driving Cars — 3D Bounding Boxes

Bounding Boxes are great to have in order to have localization for the obstacles. However, having a 2D localization with pixel coordinates may not be very useful. What would be preferable is to have the 3D position with x,y,z directly.

Moving from 2D to 3D Bounding Boxes allows to understand vehicle’s exact position and distance from us, as well as its precise orientation and direction.

It turns out that this is quite difficult to get from pixel coordinates:

Getting x and y means we can have the distance of the car compared to our camera, and exact lateral and longitudinal distance, from bounding box coordinates.
Getting z means we also have a high for the obstacle, which is only possible to estimate from the object class.

The related paper discusses an approach to estimate 3D Bounding Boxes using Deep Learning and geometry.

Neural Network architecture for 3D estimation

In this approach, Deep Learning is again used for feature learning (dimensions, angle, confidences). Geometry is then used to translate information into a 3D world.

Having 3D Bounding Boxes can allow for 3D matching with 3D sensors such as LiDARs. It allows better understanding of the orientation of a vehicle and then anticipating its behavior. 2D Bounding Boxes are often presented when people learn self-driving cars technology. However, 3D Bounding boxes are much more relevant to the problem.

Freespace Detection

Freespace detection is quite famous in the self-driving car world. Yet a lot of people still wonder what is the use. I didn’t have the chance to use it since developments do not prioritize freespace use; but I hope to have a good idea of the use.

The architecture is similar to the segmentation part with the lane line detection problem.

The encoder-decoder approach is similar to the U-Net one. Encoder means convolutions and learning features, decoder means recreating the feature map.

Freespace can be pretty useful to know how to manoeuvre around a slow vehicle and change lanes. It can also be used when lanes are unavailable and need to be recreated or when there are obstacles in the way and the car needs to stop. Freespace can therefore be used for redundancy with obstacle detection algorithms.

Conclusion

We have discussed multiple ways to use Computer Vision and Deep Learning in a self-driving car. The purpose is always the same; finding obstacles and lanes, estimating velocities, directions and positions.
Deep Learning is outperforming a lot of techniques easily available like traditional Computer Vision and Machine Learning. These techniques are still needed to complete Deep Learning in the final task.
The need to understand how to reimplement state of the art research papers is here. Approaches are always changing and it might be challenging if you don’t know how to adapt to using new techniques frequently.
In the end, cameras are capable of doing a lot of things. Deep Learning is almost never alone. There are always algorithms associated to make the solution more robust or to adapt the neural networks result to our desired output.
We could imagine more solutions like distance estimation using bounding box sizes and classes, or mixing tracking through time with 3D bounding box to make a 3D obstacle tracker using a monocular camera. The use of time is an important factor as we may want to predict other vehicles’ behaviors in the future.
In the end, Deep Learning is the go to solution in self-driving cars if we are using the camera.
Before I conclude, I would like to invite you to the private mailing list. Every day, I send an email where I share my experience on AI and Autonomous technology. Starting tomorrow, you will receive life-changing career advices, technical content like this one or discounts for my courses. Subscribe here, I hope to see you in tomorrow’s email.
Jeremy Cohen.