My Trials and Tribulations with Machine Learning: Part One

Convolutional Neural Networks & Image Classification

--

This is the first part of a machine learning series: the subtopics I’ve learned, projects where I’ve used machine learning, the mistakes I’ve made along the way — and how you can learn machine learning, too.

Picture this:

A nineteen-year-old college student starting her second summer internship at Sandia National Laboratories in Albuquerque, New Mexico. (Habitually, I now say, “Don’t worry, there’s no humidity — it’s a dry heat!”)

I finished my tasks for my first project and was ready for another challenge. After talking to the right managers and describing my interests and skills, I was handed a behemoth: a Python-based Machine Learning project primarily for the Department of Energy.

Whoa. For someone without much experience in programming at the time, this was a daunting project. I had some experience in Python, but only dreamed of working on a mind-bending task such as this.

“Python + Machine Learning” with Python logo.
Often times, Python is the preferred language for machine learning tasks. Python offers wonderful libraries, frameworks and tools to make ML easier and leaner.

My mentor for this project was not only incredibly bright, but patient. Despite being excited to jump right in, before I had a chance to review any code that was already written, my mentor sat me down to explain exactly what machine learning was, how it works, and why we wanted to use it.

Before I continue, I want to put out a disclaimer: this technology was (and currently is, to my knowledge), used by the United States Department of Energy and Department of Defense. Although it’s needless to say, I will say it anyway — I will not go into specific details of what I worked on.

Getting Started — What Is Machine Learning?

My mentor asked me to explain what I thought machine learning was. Simple enough, right?

As I explained what I knew about machine learning, I realized I knew almost nothing beyond general, super high level concepts. I knew machine learning:

  • Automates analytical model building
  • Learns and improves through experience
  • Builds understanding without explicit instruction
  • ???
  • … HAL 9000?
Gif from 2001: A Space Odyssey: “Open the pod bay doors, HAL.”
I’m sorry Dave, I’m afraid I can’t do that.

Starting with Image Classification

My mentor gave me a wonderful example. He showed me a picture of a cat, and asked me if I knew what it was.

Shutterstock image of a typical tabby cat, sitting.
Cat photo. Thanks, shutterstock!

“It’s a picture of a cat,” I bravely stated.

“How do you know this is a picture of a cat?”

“Well, it has a tail, 2 ears shaped in triangles, cat eyes, it’s fluffy....”

“Right, but have you ever seen this image before? How did you know to categorize this image as a cat?” While I have not seen the image my mentor showed me, I have seen photos like it, and logically deduced that the image was of a cat.

“One day, when you were young, you were shown a picture of a cat,” my mentor continued. “You were made aware of the properties and characteristics of a cat. Let me show you another picture, and I want you to tell me what it is:”

Shutterstock-esque profile image of a Sphynx cat, sitting.
Fine Art America

“That’s a cat, too,” I said, knowing exactly where he was going with this. I explained that I knew the first photo was a photo of cat due to its fluffiness, but this one definitely isn’t fluffy. I had to rely on my abundant knowledge of cats to know that this was also a cat.

Cartoon drawing of a cat, on looseleaf paper using a fine point Sharpie.
A simple drawing by Yours Truly. I would see this and say this is also a cat.
Another looseleaf/Sharpie drawing of a cat, but more manic/frantic and less accurate.
Another one!
Even more manic/frantic drawing of a cat on looseleaf with Sharpie, but this time without a face.
Aaaaaaand another!
Last drawing of cat, colored in and more organized in a way we perceive cats to be shaped. No face.
I promise there’s a point to these, not just for me to show off my poor artistry.

Without me telling you (and with a large assumption that you are visually able), would you be able to deduce that all of these are images of cats?

How do we, as humans, understand that these are all illustrations of cats? Cats can come in all shapes, colors and sizes, some cats are missing limbs or ears, some cats are conjoined twins.

We were conditioned to parse “cat” in these ways, with each iteration of exposure to a “cat,” widening our definition of a cat, thus becoming “smarter.”

And this — all of this — was my introduction to machine learning (specifically image classification by training a Convolutional Neural Network).

Thanks, Towards Data Science! A great illustration of how a CNN classifies.

What is a Convolutional Neural Network (CNN)?

As I read in a Towards Data Science article, Convolutional Neural Networks have these characteristics:

  • Convolutional layers
  • ReLU layers
  • Pooling layers
  • a Fully connected layer

Let’s break these down.

Convolutional Layers

Convolutional layers are the major building blocks used in Convolutional Neural Networks. CNNs are ideal for processing 2D images, because they use 2D convolutional layers.

A convolution is the simple application of a filter to an input that results in an activation. Repeated application of the same filter to an input results in a map of activations called a feature map, indicating the locations and strength of a detected feature in an input, such as an image.

Efficiently summarizing feature map of 2D images.

A “filter” passes over the input image, scanning a few pixels at a time, creating the feature map predicting the class to which each feature belongs.

ReLU Layers

The ReLU (Rectified Linear Unit) layer is essentially an activation function, and is common among all types of image classification — not just CNNs.

The rectified linear activation function is a piecewise linear function that will output the input directly if is positive, otherwise, it will output zero.

Mathematically, ReLU is defined as y = max(0, x).

Graph representation of the function.
Screenshot of my laptop when I put in y = max(0, x) from Desmos.

Because of the simplicity of this activation function, it is easy to use and understand, cheap to run and efficient to train.

Here are more examples of activation functions:

For more information on these, check out Machine Learning from Scratch

Pooling Layers

Pooling layers provide an approach to downsampling feature maps by summarizing the presence of features in patches of the feature map.

An issue we haven’t yet discussed is changing the initial mapping of the input. Small movements in the position of the feature in the input image, such as cropping or zooming in, will result in a different feature map.

That is where downsampling comes into play. This is where a lower resolution version of an input signal is created that still contains the large or important structural elements, without the fine detail that may not be as useful to the task.

The first cartoon drawing of the cat from earlier, and a second image of the same drawing but slightly enlarged and rotated.
Same image of the same cat, but one image is rotated and slightly enlarged.

Downsampling can be achieved with solely convolutional layers by changing the stride of the convolution across the image, as shown by Machine Learning Mastery, but a more efficient and common approach is to use a pooling layer.

Pooling reduces the amount of information in each feature obtained in the convolutional layer while maintaining the most important information, this shortens the training time and controls overfitting.

A Fully Connected Layer

This is the string that ties it all together. A fully connected layer takes the output of the convolution/pooling layers and predicts the best label to describe the image.

From MissingLink.ai:

Fully connected input layer (flatten)━ take the output of the previous layers, “flattens” them and turns them into a single vector that can be an input for the next stage.

The first fully connected layer━takes the inputs from the feature analysis and applies weights to predict the correct label.

Fully connected output layer━gives the final probabilities for each label.

A gif showing a program classifying an image of a dog.
A great example from Becoming Human.

Maybe next time, my mentor will ask me to categorize a dog!

I hope you enjoyed learning about Convolutional Neural Networks and image classification with me. If you like what you read or have any questions, comments or would like to collaborate with me on an article (or speak with me about job opportunities at your workplace), feel free to tweet me @mackied0g or connect with me on LinkedIn — don’t be shy, I love feedback and collaboration. Be sure to come back to check out my Part 2 of my Machine Learning series: Reinforcement Learning with AWS!

// Resources

Don’t forget to give us your 👏 !

--

--

NY-based techie. Passionate about STE(a)M, security, AI/ML, memes and omitting the Oxford comma.