Constructing a CNN network for Dogs and Cats dataset

Published in

Becoming Human: Artificial Intelligence Magazine

6 min readDec 18, 2018

We’ve got the data, but we can’t exactly just stuff raw images right through our convolutional neural network. First, we need all of the images to be the same size, and then we also will probably want to just grayscale them. Also, the labels of “cat” and “dog” are not useful, we want them to be one-hot arrays.

numpy (pip install numpy) tqdm (pip install tqdm)

We will be using the GPU version of TensorFlow along with tflearn.

Trending AI Articles:

1. Deep Learning Book Notes, Chapter 1
2. Visual Music & Machine Learning Workshop for Kids
3. Learning from mistakes with Hindsight Experience Replay
4. Mask R-CNN explained

To install the CPU version of TensorFlow, just do pip install tensorflow To install the GPU version of TensorFlow, you need to get alllll the dependencies and such.

First, we’ll get our imports and constants for preprocessing:

import cv2                 # working with, mainly resizing, images
import numpy as np         # dealing with arrays
import os                  # dealing with directories
from random import shuffle # mixing up or currently ordered data that might lead our network astray in training.
from tqdm import tqdm      # a nice pretty percentage bar for tasks. Thanks to viewer Daniel BA1/4hler for this suggestion

TRAIN_DIR = 'X:/Kaggle_Data/dogs_vs_cats/train/train'
TEST_DIR = 'X:/Kaggle_Data/dogs_vs_cats/test/test'
IMG_SIZE = 50
LR = 1e-3

MODEL_NAME = 'dogsvscats-{}-{}.model'.format(LR, '2conv-basic') # just so we remember which saved model is which, sizes must match

Now, our first order of business is to convert the images and labels to array information that we can pass through our network. To do this, we’ll need a helper function to convert the image name to an array.

Our images are labeled like “cat.1” or “dog.3” and so on, so we can just split out the dog/cat, and then convert to an array like so:

def label_img(img):
    word_label = img.split('.')[-3]
    # conversion to one-hot array [cat,dog]
    #                            [much cat, no dog]
    if word_label == 'cat': return [1,0]
    #                             [no cat, very doggo]
    elif word_label == 'dog': return [0,1]

Now, we can build another function to fully process the training images and their labels into arrays:

def create_train_data():
    training_data = []
    for img in tqdm(os.listdir(TRAIN_DIR)):
        label = label_img(img)
        path = os.path.join(TRAIN_DIR,img)
        img = cv2.imread(path,cv2.IMREAD_GRAYSCALE)
        img = cv2.resize(img, (IMG_SIZE,IMG_SIZE))
        training_data.append([np.array(img),np.array(label)])
    shuffle(training_data)
    np.save('train_data.npy', training_data)
    return training_data

The tqdm module was introduced to me by one of my viewers, it’s a really nice, pretty, way to measure where you are in a process, rather than printing things out at intervals…etc, it gives a progress bar. Super neat.

Anyway, the above function converts the data for us into array data of the image and its label.

When we’ve gone through all of the images, we shuffle them, then save. Shuffle modifies a variable in place, so there’s no need to re-define it here.

With this function, we will both save, and return the array data. This way, if we just change the neural network’s structure, and not something with the images, like image size..etc..then we can just load the array file and save some processing time. While we’re here, we might as well also make a function to process the testing data. This is the actual competition test data, NOT the data that we’ll use to check the accuracy of our algorithm as we test. This data has no label.

def process_test_data():
    testing_data = []
    for img in tqdm(os.listdir(TEST_DIR)):
        path = os.path.join(TEST_DIR,img)
        img_num = img.split('.')[0]
        img = cv2.imread(path,cv2.IMREAD_GRAYSCALE)
        img = cv2.resize(img, (IMG_SIZE,IMG_SIZE))
        testing_data.append([np.array(img), img_num])
        
    shuffle(testing_data)
    np.save('test_data.npy', testing_data)
    return testing_data

Now, we can run the training:

train_data = create_train_data()
# If you have already created the dataset:
#train_data = np.load('train_data.npy')

Next, we’re ready to define our neural network:

import tflearn
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.estimator import regression

convnet = input_data(shape=[None, IMG_SIZE, IMG_SIZE, 1], name='input')

convnet = conv_2d(convnet, 32, 5, activation='relu')
convnet = max_pool_2d(convnet, 5)

convnet = conv_2d(convnet, 64, 5, activation='relu')
convnet = max_pool_2d(convnet, 5)

convnet = fully_connected(convnet, 1024, activation='relu')
convnet = dropout(convnet, 0.8)

convnet = fully_connected(convnet, 2, activation='softmax')
convnet = regression(convnet, optimizer='adam', learning_rate=LR, loss='categorical_crossentropy', name='targets')

model = tflearn.DNN(convnet, tensorboard_dir='log')

What we have here is a nice, 2 layered convolutional neural network, with a fully connected layer, and then the output layer. It’s been debated whether or not a fully connected layer is of any use. I’ll leave it in anyway.

This exact convnet was good enough for recognizing hand 28x28 written digits. Let’s see how it does with cats and dogs at 50x50 resolution.

Now, it wont always be the case that you’re training the network fresh every time. Maybe first you just want to see how 3 epochs trains, but then, after 3, maybe you’re done, or maybe you want to see about 5 epochs. We want to be saving our model after every session, and reloading it if we have a saved version, so I will add this:

if os.path.exists('{}.meta'.format(MODEL_NAME)):
    model.load(MODEL_NAME)
    print('model loaded!')

Now, let’s split out training and testing data:

train = train_data[:-500]
test = train_data[-500:]

Now, the training data and testing data are both labeled datasets. The training data is what we’ll fit the neural network with, and the test data is what we’re going to use to validate the results. The test data will be “out of sample,” meaning the testing data will only be used to test the accuracy of the network, not to train it.

We also have “test” images that we downloaded. THOSE images are not labeled at all, and those are what we’ll submit to Kaggle for the competition.

Next, we’re going to create our data arrays. For some reason, typical numpy logic like:

array[:,0] and array[:,1] did NOT work for me here. Not sure what I’m doing wrong, so I do this instead to separate my features and labels:

X = np.array([i[0] for i in train]).reshape(-1,IMG_SIZE,IMG_SIZE,1)
Y = [i[1] for i in train]

test_x = np.array([i[0] for i in test]).reshape(-1,IMG_SIZE,IMG_SIZE,1)
test_y = [i[1] for i in test]

Now we fit for 3 epochs:

model.fit({'input': X}, {'targets': Y}, n_epoch=3, validation_set=({'input': test_x}, {'targets': test_y}), 
    snapshot_step=500, show_metric=True, run_id=MODEL_NAME)Training Step: 1148  | total loss: 11.71334 | time: 4.061s
| Adam | epoch: 003 | loss: 11.71334 - acc: 0.4913 -- iter: 24448/24500
Training Step: 1149  | total loss: 11.72928 | time: 5.074s
| Adam | epoch: 003 | loss: 11.72928 - acc: 0.4906 | val_loss: 11.88134 - val_acc: 0.4840 -- iter: 24500/24500
--

Increasing the size of the network

First, we need to reset the graph instance, since we’re doing this in a continuous environment:

convnet = input_data(shape=[None, IMG_SIZE, IMG_SIZE, 1], name='input')

convnet = conv_2d(convnet, 32, 5, activation='relu')
convnet = max_pool_2d(convnet, 5)

convnet = conv_2d(convnet, 64, 5, activation='relu')
convnet = max_pool_2d(convnet, 5)

convnet = conv_2d(convnet, 128, 5, activation='relu')
convnet = max_pool_2d(convnet, 5)

convnet = conv_2d(convnet, 64, 5, activation='relu')
convnet = max_pool_2d(convnet, 5)

convnet = conv_2d(convnet, 32, 5, activation='relu')
convnet = max_pool_2d(convnet, 5)

convnet = fully_connected(convnet, 1024, activation='relu')
convnet = dropout(convnet, 0.8)

convnet = fully_connected(convnet, 2, activation='softmax')
convnet = regression(convnet, optimizer='adam', learning_rate=LR, loss='categorical_crossentropy', name='targets')

model = tflearn.DNN(convnet, tensorboard_dir='log')



if os.path.exists('{}.meta'.format(MODEL_NAME)):
    model.load(MODEL_NAME)
    print('model loaded!')

train = train_data[:-500]
test = train_data[-500:]

X = np.array([i[0] for i in train]).reshape(-1,IMG_SIZE,IMG_SIZE,1)
Y = [i[1] for i in train]

test_x = np.array([i[0] for i in test]).reshape(-1,IMG_SIZE,IMG_SIZE,1)
test_y = [i[1] for i in test]

model.fit({'input': X}, {'targets': Y}, n_epoch=3, validation_set=({'input': test_x}, {'targets': test_y}), 
    snapshot_step=500, show_metric=True, run_id=MODEL_NAME)Training Step: 4978  | total loss: 0.31290 | time: 4.031s
| Adam | epoch: 010 | loss: 0.31290 - acc: 0.8641 -- iter: 24448/24500
Training Step: 4979  | total loss: 0.30547 | time: 5.044s
| Adam | epoch: 010 | loss: 0.30547 - acc: 0.8683 | val_loss: 0.57259 - val_acc: 0.7980 -- iter: 24500/24500

Don’t forget to give us your 👏 !

Constructing a CNN network for Dogs and Cats dataset

Trending AI Articles:

Increasing the size of the network

Don’t forget to give us your 👏 !

Written by Vatsal Raval