How Dataset size and RAM choke your Deep Learning for Computer Vision

Published in

Becoming Human: Artificial Intelligence Magazine

5 min readAug 30, 2020

a simple way to protect your computer RAM from overloading and promise your DNN training’s success on a huge image dataset.

Background

Dealing with large image datasets, computer memory can be easily overloaded. Some people don’t have an idea about how large an image dataset could be. The MNIST dataset, although each handwritten digit is in 28x28, is composed of a training set of 60,000 examples, and a test set of 10,000 examples. It doesn’t require too much hard drive capacity for the downloaded dataset. But when we read the dataset into Numpy array, too much memory (RAM) will be taken. Instead of an array output, an error message “run out of memory” appears on the screen. What is worse, with the development of Data Science, the size of datasets for our researches is growing up. The COCO dataset, the Cityscapes dataset, etc. need much larger both hard drive capacity and memory.

It seems that we have to buy better and more expensive equipment to struggle with limited computer memory. Otherwise, we can’t proceed these huge image datasets.

On the other hand, deep learning algorithms require a great many computer calculations, which could also run out of computer memory. The classification, detection, segmentation algorithms of Computer Vision with DNN handle with enormous data volume. The more train data, the better our result. Though we have access to big datasets, like Pascal VOC, COCO, Cityscapes, which are often and free for everyone. Our poor RAM doesn’t allow our processing of huge data. Either the dataset size or RAM chokes my deep learning like a force choke👹.

Our own approach

For the case, we can read images from a specific directory however not all at once but a certain number of images at each time. The idea is something like batch size in Keras/Tensorflow. The difference is with batch_size setting all images loaded in memory in advance, which brings unexpected memory load. We import an image dataset and save the image name/address in a list/dictionary, in order that we can call up the images when we want to read them into DNN for training. The image data that have been trained can be deleted as soon as possible. Therewith, only one subset of the dataset can exist in memory at each stage of training.

flow chart of reading and processing subset of a dataset

with this method, we can not only reduce the memory load for DNN training at each time but improve the performance of the model. The reason behind it is also like the idea of why we prefer using small batch_size for DNN training in order to get a better generalization. The number of subsets of the original dataset can be regarded as a new hyperparameter that is worthy of further study. But for the current objective, we can break the limitation from the dataset size of RAM with the introduced method.

Code example

The following code is for image segmentation on Pascal VOC 2012 which consists of 2913 segment images. The model is built by Keras. We divide 2913 images into 20 subsets. For each subset, there are 5 epochs with the same hyperparameters, such as optimizer, learning rate, batch size, etc. The function “getSegmentationArr” is a custom data preprocessing for semantic segmentation with Fully Convolutional Networks (FCN). The gc.collect() is a function from Garbage Collector interface, which releases unreferenced memory. It’s planed that every time there are 1000 images trained and after training a new subset is adopted for the next training. The model will be updated after each subset’s training and epoch. The starting image index from a subset is incremented by 100 each time. For almost 2900 images, and the last subset contains also 1000 images and starts with image index 1900 and ends with image index 2900. Therewith we can loop 20 times training to get full use of the image dataset. Sorry🙃 that I just removed the last 13 images for training. I think it’s okay now that we have gotten a good model.

from keras import optimizers
from keras.callbacks import ModelCheckpoint, EarlyStopping
checkpoint =
ModelCheckpoint(“FCN_8.h5”,monitor=’val_acc’,verbose=1,save_best_only=True,save_weights_only= False,mode =’auto’,period=1)
early =EarlyStopping(monitor=’val_loss’,min_delta=0,patience=3,verbose=1,mode=’auto’)
round=0
for round in range(20):
X_batch = X[0+round*100:1000+round*100]
Y = []
for seg in segmentations[0+round*100:1000+round*100]:
Y.append( getSegmentationArr( dir_seg + seg ,nClasses ,output_width ,output_height ) )
Y = np.asarray(Y)
train_rate = 0.9
index_train =
np.random.choice(X_batch.shape[0],int(X_batch.shape[0]*train_rate),replace=False)
index_test = list(set(range(X_batch.shape[0])) — set(index_train))
X_batch, Y = shuffle(X_batch,Y)
X_train, y_train = X_batch[index_train],Y[index_train]
X_test, y_test = X_batch[index_test],Y[index_test]
adam = optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(loss=’categorical_crossentropy’,optimizer=adam,metrics=[‘accuracy’])
hist1 = model.fit(X_train,y_train,validation_data=(X_test,y_test),batch_size=32,epochs=5,callbacks=[checkpoint,early],verbose=1)if round != 19:
del X_batch, Y,index_test, index_train, X_train, y_train,X_test, y_test
gc.collect()

After the 20 loops, we get the final FCN segmentation model, which has validation acc 0.9224 and mean IoU 0.737. For a simple FCN, it’s not bad.

FCN model output compared with ground truth

References

Deng, Li. “The mnist database of handwritten digit images for machine learning research [best of the web].” IEEE Signal Processing Magazine 29.6 (2012): 141-142.

Cordts, Marius, et al. “The cityscapes dataset for semantic urban scene understanding.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

Lin, Tsung-Yi, et al. “Microsoft coco: Common objects in context.” European conference on computer vision. Springer, Cham, 2014.

Everingham, Mark, et al. “The pascal visual object classes (voc) challenge.” International journal of computer vision 88.2 (2010): 303–338.

Jonathan Long ; Evan Shelhamer ; Trevor Darrell: Fully Convolutional Networks for Semantic Segmentation (2015)