Deep Learning Class Project Journal — Day1

Building a Generative Neural Network that can fill the missing center of an Image

--

I am taking Montreal University’s deep learning graduate course, given by one of the pioneer of deep learning, Aaron Courville.

The class project consists of building a generative neural network that can fill the missing center of an image.
Images have a 64x64 resolution and the missing center is 32x32, which is quite a lot. The dataset also contains sentences, attached to each image, describing the scene in the center.

Training set : ~82K images and sentences

Inputs
Targets
Text caption for the first image on the left

If you want to read the full project description, here’s the website.

What to do?

Since I am new to the field, I will not start by implementing papers I don’t fully understand, I will take this opportunity to work iteratively and keep it simple.

First, I am supposing that text captions won’t bring much to the table. I suspect that even with word embedding, because of sparsity, the added value of text to “pixel”context will be negligible. Thus, I will prioritize models on images only.

One type of generative networks we learned during the class are auto-encoders. I think they make sense in that context, because we can think of the inputs as “corrupted” images .
For my first day, I am going to explore this familly of models.

How to do it?

Although it would be fun to code this from scratch, the main goal of this project is to try many variants and get quick results. It was suggested to use Theano, but it is still to much boilerplate for me. I needed a higher abstraction, and I found the amazing Keras library.

Keras has a higher level Functional API and it allows you to use Theano or TensorFlow as backend, so you get powerful GPU optimsations for “free”. If you want to use Theano, as I did, you have to change the “backend” attribute of $HOME/.keras/keras.json.

Keras’s blog is also a great thing, I adapted their code to implement just what I needed.

First model : Fully connected deep auto-encoder

Hyper parameters:

  • Number of layers : 4 (following under-complete scheme)
  • Batch size : 250
  • Epochs : 60
  • Optimization method : Adam
  • Activation Functions : ReLU for hidden layers and Sigmoid for output

Results after ~1.5h:

autoencoder

At least it gets the colors ok… I realized one thing while waiting, my auto-encoder output layer has 64x64x3 units, but the task we have to perform on is to generate the middle part only (32x32x3). The loss is biased because it is calculated on complete images. I decided to change my model to output only the middle part. I don’t think we can call it an auto-encoder anymore, since the input is not included in the target.

Results of the second model:

weird autoencoder

Close to borders of the middle part, the colors looks a tiny bit more accurate (qualitatively).

Loss of training the last model was around 0.62 (validation was close) and it was achieved in about 10 epochs. I clearly need better tracking of accuracy and loss, I will include this in my next models.

Conclusion

My auto-encoder implementation seems to capture brightness and colors, but the images are very blurry and not that great. I will move on to models more suited for images, replacing fully connected layers by convolution and pooling layers. See you tomorrow.

Github source code

Specs:

  • GTX 1070 8GB RAM
  • Windows 10

--

--