It’s NeRF or Nothin’!

An introduction to Neural Radiance Fields and their applications

--

What was done before NeRF?

Deep Learning before this was essentially being performed a lot on 2D data, which is essentially what we call as Euclidean data. Some common applications of deep learning on 2D data include classification, regression based predictions and segmentation.

An insightful video on the evolution of Deep Learning and Pytorch by Meta.

However, nowadays, the research in deep learning has also expanded to 3D data, which is non-Euclidean in nature. The goal in this case, would be to create a 3D model out of Euclidean data after training for a finite period of time over a given number of epochs. This leads us to NeRF!

What is NeRF?

A NeRF (Neural Radiance Field) is essentially a neural network which is trained to generate a set of new 3D images or a 3D representation of an object based on a set of mostly limited 2D images. It takes in input images of a scene and interpolates between them in order to generate a complete scene.

An overview of the NeRF representation and a differentiable rendering approach, as illustrated in this paper.

Delving into the math behind this and how it works:

NeRF is a neural network which basically takes in 5D coordinates with the parameters as spatial location (x,y,z) and viewing direction (θ, φ) and outputs the opacity and color, which is 4D. In this process, classic volume rendering techniques are used to output the densities and colors into an image.

Here, I shall be explaining this using code written in PyTorch 3D.

The following steps must be followed in order to create and train a simple Neural Radiance Field:

  • Initialize the implicit renderer
  • Define the neural radiance field model
  • Fit the radiance field

Initialize the implicit renderer:

An implicit function φ, can be defined as :

This basically maps 3D points to a scalar value.

An implicit surface, S can be defined as :

Thus, an implicit surface, S is a set of all 3D points ‘x’, whose value of f(x)=τ.

Thus, for each point we evaluate f(x) to determine whether it lies on the surface or not. Then, we use a coloring function c(x), where

Thus,

which upon approximation, we get

and here,

to assign colors to the individual ray points. After this, the Raymarching step is executed in which all the f(x) and c(x) are collapsed into a single color of the pixel of the given ray that is emitted from the camera.

Thus, the renderer consists of a Raysampler and a Raymarcher. The Raysampler emits rays from image pixels and samples points along them. Some common examples of Raysamplers used in PyTorch 3D include the NDCGridRaySampler() and the MonteCarloRaySampler(). Then, the Raymarcher takes the densities and colors sampled along each ray and renders each ray into a color and an opacity value of the ray’s source pixel. An example of a Raymarcher is the EmissionAbsorptionRaymarcher().

Define the neural radiance field model:

The Neural Radiance Field module specifies a continuous field of colors and opacities over the 3D domain of the scene.

(The above is a simple neural radiance field which has been used in the following example.)

Also, we must define a function to perform the NeRF forward pass, which receives an input as a set of tensors which parameterize a bundle of rendering rays. This ray bundle is firstly converted into a set of 3D ray points in the world coordinates of the scene. Then, we use harmonic embeddings which essentially enter the color and opacity branches of the NeRF model in order to label each point with a 3D vector and a 1D scalar which ranges between [0,1] , which define a particular point’s color and opacity respectively which results in our 4D output.

Following this, we must run an optimization loop which will fit the radiance field into our observations.

Fit the radiance field:

Then, we fit the radiance field by rendering it from the viewpoints of the cameras and then, we compare the results with the observed target images and target silhouettes as specified in the code.

The training goes on as follows for 3,000 iterations by default:

A sample output generated during model training

and the final output looks something like this:

The final output as generated by the PyTorch3D code for a simple NeRF on the cow renders data.

Some drawbacks and newer techniques

The original model was slow to train and was only able to handle static scenes. It is inflexible in the sense that a model trained one a particular scene cannot be used for another scene. However, there are different methods which have improved on the original NeRF concept, such as NSVF, Plenoxels and KiloNeRF.

Thank you for making it this far! Please feel free to follow me on LinkedIn , GitHub and Medium and stay tuned for future content!

https://becominghuman.ai/

--

--

I’m a Machine Learning, Deep Learning, Data Science and AI enthusiast who’s willing and keen to learn more in these fields!