Fast AI course by Jeremy Howard

Published in

Becoming Human: Artificial Intelligence Magazine

9 min readJun 7, 2019

It’s always a mundane task to sit back and watch video content for 2 long hours to understand such critical subject as Deep Learning and Artificial Intelligence. This blog is the first part of a seven lecture series on Fast AI by Jeremy Howard, who himself is the President of Kaggle, Co-founder of Fast AI and is highly venerated in the community.

Prerequisite for this is just one year of general coding and high school math. Basic knowledge of Python is a plus.

Introduction

The first lecture captures the classic Image classification problem using the FastAI library. Currently, Fast AI supports the following applications:

Computer Vision (Image Classification)
Natural language Text (Sentiment Analysis)
Tabular Data (Predict Supermarket Sales)
Collaborative filtering (Recommendation Engine)

Image Classification

The course deals with a problem of classifying dogs vs cats with 37 different categories. Image classification where more than 2 classes are there to classify is called as Fine-Grained classification. The main difference among various image classification datasets is the way they store the labels (in a csv file, in the name of the file itself, in form of a list) of categories. Fast AI has plenty of functions to deal with such problem.

This course is been taught using Jupyter notebook but you are free to use any editor of your choice such as PyCharm or Spyder. There are various options to set up the GPU, you can choose and set up yours accordingly. I used google colab which is a free GPU service from Google. Also, FastAI sits on top of PyTorch (popular library for deep learning apart from TensorFlow), but for this course, we do not need to have hands-on for PyTorch. To get the code used in this course, you can check out FastAI GitHub repo. I will mostly be explaining the bits and pieces of the FastAI library and the theory behind the scenes.

These days GPU are much in trend to perform deep learning, especially when it comes to images. But one of the shortcomings of the current deep learning technology is that the GPUs cannot identify the size of the images on their own and we explicitly need to provide the shape of the image.

The square image size of 224*224 (by cropping and resizing) is extremely common and accepted by most of the algorithms. Later in the series, we’ll see how to use the rectangle image size. In FastAI everything you’re gonna model is an ImageDatabunch object. The Data bunch object consists of a variety of datasets including training, validation, and testing (optional) datasets. These datasets need to be normalized (using the normalized function) to make the entire data of the same size. In case of images, normalization means making the mean and standard deviation(std) same for all the images, that is, the pixel values of the three channels(red, green, blue) gets normalized. If the data is not normalized, then it becomes difficult for the model to train well. So, if you’re having trouble training the model, one thing to check is that if you’ve normalized the data or not. The models in FastAI are designed in such a way that they end up giving a result of 7*7 and that’s why the optimal size if 244. We’ll learn about this later. Once the data is loaded in the databunch object, data.show_batch() command can be used to have a look at the images in the data. The number of unique classes can be determined using the parameter ‘c’ present in the databunch object.

A Learner (a bunch of different models) is a general concept in Fast AI for a model to learn the data. Just like databunch is a general concept for data, a Learner is a general concept for models. There are sub-classes (consider them as different models) in a learner and the one particular subclass used for image classification is the ConvLearner which creates CNN for us.

model = ConvLearner(data, models.resnet34, metrics=error_rate, bs=64)

data — databunch Object
models.resnet34 — Resnet34 (Pretrained Model)
error_rate — defines the error of the model on validation set
bs — batch size

The first time we run this command, it downloads the resent34’s pre-trained weights. The resnet34 is been already trained on one and a half million images in ImageNet dataset and knows how to identify images among thousands of categories. This sort of learning is called Transfer Learning. One advantage of transfer learning is that even if we don’t have enough data then also the model can train itself really well because it has already been trained on some form of similar data.

This lets the model train in 1/100th of the original time. This method of learning also requires very less number of training images and still classifies the unseen images correctly.

To make sure that the model doesn’t overfit we use a validation set. Remember that the ImageDatabunch object already has a validation set and the model evaluates the error_rate metric for the predicted results of the validation set.

If you try training for more epochs, you’ll notice that we start to overfit, which means that our model is learning to recognize the specific images in the training set, rather than generalizing. One way to fix this is to effectively create more data, through data augmentation. This refers to randomly changing the images in ways that shouldn’t impact their interpretation, such as horizontal flipping, zooming, and rotating.

After the model is designed and compiled, FastAI uses fit_one_cycle(n) method instead of the generic fit method. The fit method is the “normal” way of training a neural net with a constant learning rate, whilst the fit_one_cycle method uses something called the 1 cycle policy, which basically changes the learning rate over time to achieve better results. n denotes the epochs (cycles for which the model goes over the data). Once the model is fitted on the training set, the weights (and other info) about the model can be saved using the following command and can be retrieved later.

model.save(‘stage-1’) - to save the model
model.load('stage-1') - to load the model

Analyzing Results

After the model is built, we can interpret the model using the Classification Interpretation object.

interp = ClassificationInterpretation.from_learner(model)interp.plot_top_losses(9, fig_size=(15,11))interp.plot_confusion_matrix(figsize=(12,12), dpi=60)interp.most_confused(min_value=2)

For more info about the functions, we can always use doc(function_name).

Unfreezing, Fine-tuning and Differential Learning Rate

Making the model better

Till now we used Transfer Learning and ran the pre-trained model with existing weights. Next, we’ll see how to better train this model according to our dataset.

It’s a two-way process, first, we run the model and get some results. It’ll never overfit and will give good results, but to really make the best use of the model, we unfreeze and fine tune it to train it better. The following command tells the model to train entirely.

model.unfreeze()

Different layers of CNN represents different levels of semantic complexity. We might want to change the output of some of the layers. This paper by Rob Fergus discusses how we can visualize the pre-trained layers of the convolution neural network and see their independent output. Earlier the (pre-trained) model was updating all the layers at the same speed. But there are certain parameters which remain unchanged for any image, for example, the information about the edges, corners of an image are gathered in the first few layers of the model. FastAI uses the concept of differential learning rates using which we don’t have to use the same learning rate for all the layers, rather we can pass a slice function inside the fit_one_cycle() method and make all the layers to have their own different learning rates depending on the specifics of the data.

model.fit_one_cycle(epochs=5, max_lr=slice(1e-6, 1e-4))

This will take the first value of the slice (1e-6) as the learning rate for the first layer and the second value (1e-4) as the learning rate for the last layer and distribute the values for the learning rate of the layers in between. How cool is that!

One method to find the optimum learning rate is the Learning Rate Finder. The below code plots the optimum learning rate that can be used in training the model.

model.recorder.plot()

It’s a rule of thumb, while we unfreeze the model, use the fit_one_cycle() method with max_lr parameter. Make the second value 10 times smaller than the default value and the first value will come from the learning rate finder plot.

This way we can train a large model like resnet34 (or resnet50) and use the same for other classification tasks.

Some Tips

If you run out of memory while running the model then just make the batch size smaller in the ConvLearner function. It works just fine just that it takes a little bit longer to train.
help(function_name) — pops up the entire doc about a function. So don't forget to make use of this if you get stuck somewhere.
RUN the code! Don’t just fall into the habit of watching lectures or reading theory. It might seem involved and equivocal at first, but galvanize yourself to get your hands dirty and run the code.
Apply the skills at your workplace or your passion project.
Understand that, “You CAN do deep learning”
The FastAI community is very responsive and growing day by day. You can check out their forum and post your queries & questions.
Follow this link for their official resources and updates.

Projects

Check out the following projects people have developed over time using FastAI.

Sara Hooker — Slow learning
CLARA: neural net music generator by Christine Mckeavey Payne
Melisa Fabros — won AI challenge in Africa using FastAI for recognizing dark-skinned faces
Envision app for blind people using FastAI — winner of Google Play Award 2019
Check out the Stanford dawn bench benchmark for ImageNet classification

Final word from Jeremy Howard — “Pick One Project, Do it Really Well, Make it Fantastic”

I hope you liked it. Share your thoughts & comments in the section below!