Super-resolution imaging using deep learning algorithms

Published in

Becoming Human: Artificial Intelligence Magazine

5 min readAug 4, 2021

Transform your low-resolution images to high-resolution images

Do you want to have high-quality images without purchasing an expensive system? Or, is image quality an issue for your network to train? This article details the top 5 deep learning-based algorithms that one should know to increase image resolution.

By seeing the above picture, you would have guessed it by now: one has better perceptual quality than others. Before introducing algorithms to enhance image resolution, let us understand what are the other ways to solve this problem.

There are two approaches to increase the resolution: hardware-based and algorithmic-based approach.

Hardware-based approach requires either decreasing the size of pixel or increasing the sensor size. Decreasing the pixel size to less than a threshold value would result in less light on a pixel, which will increase shot noise.

Increasing the sensor’s size leads to increase in system capacitance, which results in slow charge transfer rate. Furthermore, hardware-based approaches are expensive for large-scale imaging devices, and hence algorithmic approached are favoured.

I have gone through several deep learning research papers and distilled down the top 5 algorithms you should know for Single Image Super Resolution algorithms.

SRCNN:

Super Resolution Convolutional Neural Network. In a pioneer work by Chao Dong, convolutional neural network (CNN) is used. SRCNN can be looked in two parts: upsampling and refining. First, the image is upsampled using bicubic interpolation. Then, the resulted image is passed to CNN to further refine by learning the features. Since the most complex part is done by bicubic, CNN can easily learn to generate high quality image.

On the other hand, training of SRCNN is computationally expensive as all the operations are performed on high dimensional space. Moreover, pre-defined upsampling often introduces noise and blurring.

FSRCNN:

Fast Super Resolution Convolutional Neural Network. As the name suggest, Chao Dong improved his previous work, SRCNN, by removing the pre-processing step of upsample (i.e, bicubic) and introducing deconvolution layer at the end of network to learn the relationship directly from LR image. Since most of the computation is performed in the low dimensional space, FSRCNN is found to be 40 times faster than SRCNN. FSRCNN enabled a real-time video super-resolution.

SRGAN and SRResNet:

Super-resolution GAN and super-resolution ResNet. SR algorithms become better as the advancement in deep neural network architecture occurred. In SRRseNet, the authors adopted then state-of-the-art network — ResNet — for SR model and achieved promising results.

Before SRGAN, most of the SR techniques were minimising mean-squared-error or maximising PSNR which often misses high frequency details and produces a smoothing of texture. Since PSNR represents a pixel-wise loss, it does not always lead to a photo realistic image. SRGAN introduced GAN based architecture where generator is based on ResNet and discriminator is a standard discriminator. The perceptual loss — combination of adversarial loss and content loss — aid to produce a photo-realistic images.

EDSR:

Enhanced Deep Residual Networks. This work is motivated by previous work, SRResNet. In SRResNet, the author employs ResNet architecture to solve SR problem without any modifications. ResNet is used to solve higher-level vision problems, whereas SR problem is low-level vision problem. In EDSR, the authors propose to remove batch normalization (BN) layers from SRResNet architecture as normalizing the features removes the range flexibility from network. Moreover, the baseline model without BN saved 40\% of memory during the training compared to SRResNet.

ESRGAN:

Enhanced Super-Resolution GAN. In this work, the authors improve the previous SR work based on GAN — -SRGAN. In ESRGAN, three improvements are proposed in generator, discriminator, and loss function. BN layers are removed from generator as it help to improve generalization and reduce computational complexity. Instead of using basic discriminator structure, they use relativistic discriminator (RaD). RaD estimates probability of a real image being more realistic than a fake one; standard discriminator estimates of a given image being real or fake. A generator using RaD benefits from both generated and real image in the form of adversarial loss. Last modification is about perceptual loss.
Rather than constraining the features after activation layer, ESRGAN proposes to constrain it before the activation layer. As the previous convention provides sparse features which provide a weak supervision.

The above-mentioned methods are supervised methods to solve super resolution problem. For this, paired dataset of low-resolution and high-resolution images are used by down sampling high-resolution images. In real world, low-resolution image may suffer from other types of degradation as well. Moreover, in case of supervised method, they may learn the inverse of down sampling interpolation. Therefore, unsupervised methods are being developed as well.

I am Priya Dwivedi, currently working at ACDC group, UNSW to accelerate the growth of Photovoltaic Industry and therefore reduce the global carbon emission.