The Intersection of Art and AI: Identifying and Generating Famous Works using CNNs and GANs

--

By Jillian Brady and Ellie Packard

Introduction

The increasing interaction between technology and art within the past few decades has changed the landscape of art drastically. Art expresses the artist’s skill, nuance, and creativity in a way that cannot be mechanically reproduced. But in the era of Photoshop and AR, the lines between art and tech, and between real and fake, have become blurred. Using neural networks and deep learning, our project will explore if and how machine learning can be used to understand art.

Data

We will use the Kaggle dataset Painter by Numbers to conduct our research. This dataset includes tens of thousands of images of art, which are labelled with information including artist, date, and genre.

Can a machine learning model identify works of art by Pablo Picasso?

In order to explore this question, we used a subset of Painter by Numbers including 500 works by Picasso and 1600 works by artists other than Picasso. We will also use Keras, PIL, pandas, numpy, os, io, zipfile and sklearn.

Picasso Image on Left, Non-Picasso Image on Right
Picasso Image on Left, Non-Picasso Image on Right

Trending AI Articles:

1. Turn your Raspberry Pi into homemade Google Home

2. Keras Cheat Sheet: Neural Networks in Python

3. Making a Simple Neural Network

4. Artificial Intelligence Conference

What is a convolutional neural network?

A convolutional neural network (CNN) is a neural network architecture that accounts for spatial adjacency. This type of neutral network solves the problem that feed forward networks face when classifying images. Using filters, convolutional layers convolve over all spatial locations and provide a number of weight matrices for each location that essentially allow the network to learn specific patterns. This makes them highly effective when classifying images.

1. Creating a CNN

We will use Keras to build a convolutional network that will classify images as works by Picasso, or works that are not by Picasso. Considering the small dataset at hand, this model is designed to be relatively small, with few layers and filters, and also incorporates dropout. These measures will help combat overfitting. Overall, this network contains 3 convolutional layers and 2 fully connected layers, and employs maxpooling, flatten, and dropout.

model = Sequential()model.add(Conv2D(filters=32, activation='relu', kernel_size=3, strides=(3, 3), input_shape=(200, 200, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(filters=32, activation='relu',kernel_size=3, strides=(3, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Conv2D(filters=64, activation='relu', kernel_size=3, strides=(3, 3)))
model.add(MaxPooling2D(pool_size=(1, 1)))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

2. Data Pre-Processing & Data Augmentation

Images must be resized to the network’s input size. In this case, the input size is 200 x 200 (x 3 channels).

# yes_path is path to directory of images that are by Picasso
# no_path is path to directory of images that are not by Picasso
for item in os.listdir(yes_path):
with open ((yes_path + '/' + item), 'rb') as file:
im = Image.open(io.BytesIO(file.read()))
imResize = im.resize((200,200), Image.ANTIALIAS)
if np.array(imResize).shape == (200,200,3):
imResize.save(yes_path + item[:-4] +
' resized.jpg', 'JPEG', quality=90)
for item in os.listdir(no_path):
with open ((no_path + '/' + item), 'rb') as file:
im = Image.open(io.BytesIO(file.read()))
imResize = im.resize((200,200), Image.ANTIALIAS)
if np.array(imResize).shape == (200,200,3):
imResize.save(yes_path + item[:-4] +
' resized.jpg', 'JPEG', quality=90)

Data augmentation allows us to get good results despite having a smaller dataset. With only 500 works by Picasso in this dataset, including 3 versions (a resized or “squished” version, a cropped version, and a rotated version) of each work by Picasso provides the model with more samples to learn from. Therefore, after this process, the dataset includes 1500 images by Picasso and 1600 images that are not. This helps combat overfitting issues, although it isn’t the perfect solution considering the fact that these images are still strongly correlated.

for item in os.listdir(yes_path):
with open ((yes_path + '/' + item), 'rb') as file:
im = Image.open(io.BytesIO(file.read()))
width, height = im.size
if width > height:
short = height
else:
short = width
imCrop = im.crop((0, 0, short, short))
imResize = imCrop.resize((200,200), Image.ANTIALIAS)
if np.array(imResize).shape == (200,200,3):
imResize.save(yes_path + item[:-4] +
'resized1.jpg', 'JPEG', quality=90)
for item in os.listdir(yes_path):
with open ((yes_path + '/' + item), 'rb') as file:
im = Image.open(io.BytesIO(file.read()))
width, height = im.size
if width > height:
imCrop = im.crop((width-height, 0, height, width))
elif height > width:
imCrop = im.crop((0, height-width, width, height))
else:
imCrop = im
imCrop = imCrop.rotate(90)
imResize = imCrop.resize((200,200), Image.ANTIALIAS)
if np.array(imResize).shape == (200,200,3):
imResize.save(yes_path + item[:-4] + 'resized2.jpg',
'JPEG', quality=90)

The data must then be:

  • Labeled (0 if not by Picasso, and 1 if by Picasso)
  • Split into training and test subsets (with 155 samples in the test set and 2,945 samples in the train set)
  • Put into the correct format for a Keras convolutional neural network
# Add images to X and labels to y
X = []
y = []
for f in os.listdir(yes_path):
with open ((yes_path + "/" + f), 'rb') as file:
im = Image.open(io.BytesIO(file.read()))
im = np.array(im)
X.append(im)
y.append(1)
for f in os.listdir(no_path):
with open ((no_path + "/" + f), 'rb') as file:
im = Image.open(io.BytesIO(file.read()))
im = np.array(im)
X.append(im)
y.append(0)
# Split X and y into train and test sets
X = np.array(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.05, random_state=40)
# Format data for CNN
img_rows, img_cols = 200, 200
if K.image_data_format() == 'channels_first':
X_train = X_train.reshape(X_train.shape[0],3,img_rows, img_cols)
X_test = X_test.reshape(X_test.shape[0],3,img_rows, img_cols)
input_shape = (3,img_rows,img_cols)
else:
X_train = X_train.reshape(X_train.shape[0],img_rows,img_cols, 3)
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 3)
input_shape = (img_rows, img_cols, 3)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

3. Training the CNN

The training data is then used to train the convolutional neural network.

model.fit(X_train, y_train, 
batch_size=16,
epochs=5,
verbose=1,
validation_data=(X_test, y_test))

CNN Results

The model predicts whether a work of art is by Picasso with 76% accuracy.

In its inaccurate predictions, there is a balance between false negatives (43%) and false positives (57%). This is important due to the lower number of Picasso images in the training and test datasets, and demonstrates that the model does not simply choose “Not Picasso” because that answer is more likely to be correct.

The model’s 76% accuracy is based on the 155 samples in the test set. However, in creating a new test dataset consisting of 5 works by Picasso (that the model had never seen before and were not a product of data augmentation), the model had 100% accuracy and correctly predicted that all 5 works were by Picasso.

New Unseen Images in Test Dataset

Conversely, using a new test dataset consisting of 15 works that are not by Picasso, the model correctly predicted that 12 works were not by Picasso but incorrectly predicted that 3 works were by Picasso, showing only 80% accuracy.

CNN Analysis

The results show that machine learning can begin to understand and recognize the unique qualities of artists’ work. This is especially interesting considering the fact that counterfeit detection algorithms are an incredibly valuable application of machine learning in the art industry today. However, it is also clear that machine learning and technology have their limitations in determining whether or not artwork is truly by one artist or not.

What is a generative adversarial network?

Invented by Ian Goodfellow in 2014, a generative adversarial network (GAN) is composed of two distinct networks: a generator and a discriminator. Generators take a random vector as input and produce an output datapoint — in this case an image — that attempts to mimic datapoints from a given dataset. Discriminators take these images as input, and determine whether they are from a given dataset or are created by the generator. This output is represented as a single scalar value and reflects how likely it is that the image is real or fake. The accuracy of the discriminator for distinguishing between real and generated images is used to calculate the training loss and backpropogate through both the generator and discriminator to improve the entire GAN. When trained on a significant amount of unlabeled data via unsupervised learning, GANs can learn the underlying patterns and structures and generate entirely new images that could be in the given dataset.

What is StyleGAN and how does it work?

Nvidia, a major U.S. technology company, developed a powerful GAN called StyleGAN that is used specifically to generate high-quality, realistic faces and released it as an open source tool earlier this year. StyleGAN gradually generates artificial images from very low to higher resolutions through progressive layer growing and modifies the input of each level separately, which allows improvements in different image attributes (from coarse features to finer details) without affecting other levels. Unlike a traditional generator architecture, this model also incorporates a new mapping network that helps address gaps in the image dataset by allowing sampling from a uniform distribution and then warping the distribution. The mapping network’s output is then fed into multiple layers of the generator network.

Use of Transfer Learning in the Model

Given a query image, a traditional GAN could input a random latent vector into its generator to generate a random image, and then compare that image to the query image. The loss function, which would be the pixel by pixel difference between the randomly generated image and the query image, could be used to backpropogate the gradients through the generator and then update the latent vector with gradient descent. The generator remains constant in this process. This process seeks to optimize the latent vector that should eventually produce an image that resembles the given query image.

However, this does not always work since the algorithm sometimes gets stuck at a worse local minima and no longer updates the latent vector, leading to an output image that does not resemble the query image at all. To address this, pretrained image classifiers are incorporated into the StyleGAN model to serve as a lens to look at the pixels of the generated and query images. The image classifier used here specifically is a pretrained VGG-16 network that has been trained to classify ImageNet images. The image classifier extracts features from the generated output image and the query image, and then the algorithm performs backpropogation from the loss computed between the extracted features of the generated and query images instead of between individual pixels. This process requires a significant amount of time, but it is a much better approach to transforming one image to another. Using this model, we will generate entirely new images that seek to mimic famous works of art.

1. Cloning the StyleGAN encoder repository

We will use the Python notebook provided by Arxiv Insights as the basis for our exploration. We open this notebook in Google Colab and enable GPU acceleration. This notebook uses a StyleGAN encoder provided by Peter Baylies. We clone his Github repository and change the current directory into this repo folder.

!rm -rf sample_data
!git clone https://github.com/pbaylies/stylegan-encoder
cd stylegan-encoder/

2. Formatting the input images

We import the necessary modules, unzip the folder of training images (roughly 11,000 images provided by the Kaggle dataset), and move them to a Desktop folder.

import os
from zipfile import ZipFile
import zipfile
import shutilexplain
path = "/Users/Ellie/Desktop/train_1.zip"
directory_to_extract_to = "/Users/Ellie/Desktop/ANN/final_data"
with zipfile.ZipFile(path, 'r') as zip_ref:
zip_ref.extractall(directory_to_extract_to)

We copy and paste the labels for all of the images in the Kaggle dataset into a text file. We create a list of the labels only used to describe images in the smaller subset used and write them into another text file.

label_list = []with open('all_labels.txt', 'r') as file:
for line in file:
word_list = line.split()
image_name = word_list[0]
if image_name in os.listdir("/Users/Ellie/Desktop/ANN/final_data"):
label_list.append(line)
with open('train_labels.txt', 'a') as file:
for elem in label_list:
file.write(elem)

We then use the image labels to determine which images are portraits, and then move all of the portraits into a separate folder. We also set up the folder structure for our images of famous artworks.

with open('train_labels.txt', 'r') as file:
for line in file:
word_list = line.split()
name = word_list[0]
genre = word_list[-2]
if genre == "portrait":
shutil.move("/Users/Ellie/Desktop/ANN/final_data/"+name, "/Users/Ellie/Desktop/ANN/portrait_imgs/"+name)
rm -rf aligned_images raw_images
mkdir aligned_images raw_images

We move all of the images in portait_imgs to the raw_images folder within the stylegan-encoder directory. We then import TensorFlow (must be version 1.15) and use a script included in the repo that looks for faces in the images from the raw_images folder, crops them out, aligns them by centering the nose and making the eyes horizontal, rescales the resulting images, and saves them in the aligned_images folder.

import tensorflow as tf!python align_images.py raw_images/ aligned_images/ --output_size=1048

We choose different subsets of 4 to 6 portraits with common themes (ie. women, men, children) and save them to different folders to run the model on. For the first round when we generate portraits of women, we upload a subset of six female portraits to the aligned_images folder in Google Colab by compressing the images to a zip file, uploading them to Google Drive, and unzipping them within Google Colab to the aligned_images directory. We will repeat this step for each subset during different model runs.

from google.colab import drive
drive.mount('/content/drive')
!unzip -uq "/content/drive/My Drive/ladies.zip" -d "/content/stylegan-encoder/aligned_images"

3. Encoding faces into StyleGAN latent space

As we discussed earlier, using pretrained image classifiers leads to a better solution but requires much more time to find the optimal latent code that will produce an output image similar to the original query image. In order to address this, we use a pretrained residual network that gives an initial estimate of the latent space vector in the StyleGAN network.

import gdown!gdown https://drive.google.com/uc?id=1aT59NFy9-bNyXjDuZOTMl0qX0jmZc6Zb
!mkdir data
!mv finetuned_resnet.h5 data
!rm -rf generated_images latent_representations

After estimating the initial latent codes using the pretrained ResNet, we will run gradient descent to optimize the latent portraits. The encoding arguments include the learning rate, decay rate, number of iterations, and L1 penalty which can all be adjusted to improve the output images.

!python encode_images.py --batch_size=2 --output_video=True --load_resnet='data/finetuned_resnet.h5' --lr=0.01 --decay_rate=0.8 --iterations=200 --use_l1_penalty=0.3 aligned_images/ generated_images/ latent_representations/
print("\n************ Latent code optimization finished! ***************")

4. Displaying the results of the encoding

We load the StyleGAN network from Nvidia into memory to sample, including the generator, discriminator, and averaged generator networks.

import dnnlib, pickle
import dnnlib.tflib as tflib
tflib.init_tf()
synthesis_kwargs = dict(output_transform=dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True), minibatch_size=1)
model_dir = 'cache/'
model_path = [model_dir+f for f in os.listdir(model_dir) if 'stylegan-ffhq' in f][0]
print("Loading StyleGAN model from %s..." %model_path)
with dnnlib.util.open_url(model_path) as f:
generator_network, discriminator_network, averaged_generator_network = pickle.load(f)

print("StyleGAN loaded & ready for sampling!")

We then use the generator network to generate the output images from the optimal latent representations and plot them.

def generate_images(generator, latent_vector, z = True):
batch_size = latent_vector.shape[0]

if z: #Start from z: run the full generator network
return generator.run(latent_vector.reshape((batch_size, 512)), None, randomize_noise=False, **synthesis_kwargs)
else: #Start from w: skip the mapping network
return generator.components.synthesis.run(latent_vector.reshape((batch_size, 18, 512)), randomize_noise=False, **synthesis_kwargs)
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
for f in sorted(os.listdir('latent_representations')):
w = np.load('latent_representations/' + f).reshape((1,18,-1))
img = generate_images(averaged_generator_network, w, z = False)[0]
plt.imshow(img)
plt.axis('off')
plt.title("Generated image from %s" %f)
plt.show()

GAN Results & Analysis

Women’s Faces

Original Portraits
GAN Generated Portraits

We used the preset encoding argument values (lr=0.01, decay_rate=0.8, iterations=200, use_l1_penalty=0.3) for the women subset and produce artificially generated images that mimic the original portraits well. The model accurately copied facial expressions and general features, but missed smaller details such as jewelry and clothing and simplified complicated hairstyles.

Men’s Faces

Original Portraits
GAN Generated Portraits

The model again produced fairly accurate copies of the male subset of portraits. Although the figures’ heads are slightly angled from the left or right that hide portions of their faces, the network could correctly reproduce their key facial features. The network missed some clothing details and it is apparent especially with this set of images that it was more difficult to detect and mimic facial hair. The two images on the right reveal this challenge since the generated portrait depicts the same man without facial hair.

Children’s Faces

Original Portraits
GAN Generated Portraits

The network performed especially well when generating images that resembled the subset of children’s portraits. The new generated images include more clothing details, the outline of people in the background, and manage to capture the piercing blue eyes of the girl on the right panel.

Examining biases in the GAN model

There were very few portraits of people of color in the art images dataset — which is entirely another issue involving lack of racial diversity in the Eurocentric art world — but we decided to run the model on a subset of portraits we could find depicting people of color. Using the preset encoding argument values, the model produced images that did not mimic the original query images at all and looked like blotches of random colors. We adjusted the encoding argument values by increasing the number of iterations from 200 to 1000, decreasing the learning rate from 0.01 to 0.001, and decreasing the L1 penalty from 0.3 to 0.1. The resulting output mimicked the original portraits slightly better.

Original Portraits
GAN Generated Portraits

However, we notice that the generated images had smaller noses, thinner lips, and replaced dark hair and head pieces with blonde or red hair. The StyleGAN network was trained on faces from a combination of a large Flickr dataset (FFHQ) and a celebrity dataset (CelebA-HQ). The celebrity face dataset may have introduced bias to the network since Hollywood is dominated by white celebrities. Researchers at the University of Southern California studied the top 100 films of 2014 and found that almost three-quarters of all characters were white. We hypothesize that this bias in the dataset caused the model to generate images that included caucasian features, despite the original portraits depicting people of color.

This conclusion has implications for future use of GANs and more broadly the field of artificial intelligence and deep learning. It is important that we recognize even minor biases in the initial datasets that we train these advanced models on since networks can encode and magnify biases. A study conducted by MIT researcher Joy Buolamwini found that the most advanced facial recognition systems developed by companies including Microsoft and IBM had 34% more errors with dark-skinned females than light-skinned males when identifying 1,000 faces as male or female. These biases in facial recognition and generation tools can thus create inequalities in individuals’ use of emerging AI technologies and should be prevented earlier before they are too difficult to identify within the models and solve.

What would Picasso’s abstract figures have looked like in person?

Bringing it back to our earlier exploration of Picasso’s famous works, we will apply our GAN to visualize what Picasso’s models may have looked like in reality based on their abstract representations. We run the model on a subset of Picasso paintings and adjust the encoding argument values, with a learning rate of 0.001 and a L1 penalty of 0.1 for 500 iterations.

Original Picasso Images
GAN Generated Images

The artificially generated images accurately mimic the color palletes and some facial features of Picasso’s figures. For the left figure, the model was able to identify the large nose, hairstyle, and stern facial expression of the figure. The generated image of the middle figure reflects the same soft smile of the original portrait, but does not include details in the hairstyle or hand placement. The model attempts to mimic the thick lines surrounding the eyes of the figure on the right as heavier makeup, which was an interesting outcome. This fun visualization of Picasso’s abstract pieces helps bring the figures to life and reflects abundant opportunities for further GAN applications in the art industry.

Sources

Don’t forget to give us your 👏 !

--

--