Probability Distribution Functions in Neural Networks

Published in

Becoming Human: Artificial Intelligence Magazine

5 min readNov 1, 2021

Introduction

“Neural networks are computing systems with interconnected nodes that work much like neurons in the human brain.” — SAS

Neural networks are nodes in a densely packed system that takes input numbers and outputs more numbers. If we look closely at a dense neural network, we can find neurons connected, like the image below.

If we further zoom in, we can see precisely what each neuron does. For example, a neuron can be seen as a box that eats a number and throws another computer number as output.

The limitation of this kind of neural network is that the neuron has to output a single concrete number. Unfortunately, it can’t suggest multiple numbers, each having certain confidence. A good analogy can be found in physics. The current neural network architecture can be considered a mechanical, physical process with a particular outcome, but we would like to view multiple results. For example, in quantum mechanics, the probability wave function indicates the probability of finding a particle at a specific point. This concept can be seen in the image below.

Approach

I’m going to try to mimic this concept in neural networks, allowing a neuron to output a probability function that would indicate the values that the neuron is the most confident upon.

Tensorflow is an excellent library for building custom neural network layers, so I will use it for this project. We can use the Keras API to create a custom layer that can be easily integrated into a neural network architecture of our choice. I will call this layer PDNN ( probability distribution function neural network).

We can build a neural network in TensorFlow using the following structure:

class PDNN(tf.keras.layers.Layer):
  def __init__(self, num_outputs):
    ...
  def build(self, input_shape):
    ...
  def call(self, inputs):
    ...

Trending AI Articles:

1. Why Corporate AI projects fail?
2. How AI Will Power the Next Wave of Healthcare Innovation?
3. Machine Learning by Using Regression Model
4. Top Data Science Platforms in 2021 Other than Kaggle

We have a class with three main functions:

__init__ , where you can do all input-independent initialization
build, where you know the shapes of the input tensors and can do the rest of the initialization
call, where you do the forward computation

The wave function is complex-valued. To find the probability density, we need to find the squared modulus of the wave function. Therefore, to mimic this kind of behavior in our layer, we must first create a complex-valued part and then find its squared modulus. That being said, I’ll first create a layer that outputs the complex values.

The PDNN layer can be found in this GitHub repository.

class PDNN(layers.Layer):
    def __init__(self, num_outputs, PDFS):
        super(PDNN, self).__init__()
        self.num_outputs = num_outputs
        self.PDFS = PDFS
    def build(self, input_shape):
        self.w = self.add_weight(
            shape=[self.PDFS,self.num_outputs,1],
            name="w",
            trainable=True,
        )
        self.b = self.add_weight(
            shape=[self.PDFS,self.num_outputs,1],
            name="b",
            trainable=True,
        )
        self.m = self.add_weight(
            shape=[1,self.PDFS,self.num_outputs,1],
            name="m",
            trainable=True,
        )
    def call(self, input_tensor):   
        pi = tf.constant(math.pi)
        e = tf.constant(math.e)space = tf.constant([value/10 for value in range(1,1000 )], dtype=tf.float32)
        space = tf.reshape(space, [1,999])
        space = tf.tile(space, [self.num_outputs,1])
        space = tf.reshape(space, [1,self.num_outputs,999])
        space = tf.tile(space, [self.PDFS,1,1])
        
        input_tensor = tf.reshape(input_tensor, [1,self.num_outputs])
        input_tensor = tf.tile(input_tensor, [self.PDFS,1])
        input_tensor = tf.reshape(input_tensor, [self.PDFS,self.num_outputs,1])
        input_tensor = self.w*input_tensor+self.b
        
        pdf = tf.complex(tf.math.cos(input_tensor*space), tf.math.sin(input_tensor*space))pdf = tf.reshape(pdf, [1,self.PDFS,self.num_outputs,999])
        pdf = pdf*tf.complex(self.m, self.m)
        pdf = tf.math.reduce_sum(pdf, axis=1, keepdims=True)
        pdf = tf.reshape(pdf, [1,self.num_outputs,999])
        pdf = tf.abs(pdf)
        return pdf

We can now create a basic architecture with our newly created custom layer.

inp = L.Input(shape=(7,))

x = PDNN(7, n_pdfs)(inp)
x = L.Lambda(lambda x: tf.math.pow(x,2))(x)
x = L.Lambda(lambda x: tf.math.top_k(x,k=5)[1]/10)(x)
x = L.Flatten()(x)
x = L.Dense(32, activation='relu')(x)

y = PDNN(7, n_pdfs)(inp)
y = L.Lambda(lambda x: tf.math.pow(x,2))(y)
y = L.Lambda(lambda x: tf.math.top_k(x,k=5)[1]/10)(y)
y = L.Flatten()(y)
y = L.Dense(64, activation='relu')(y)

z = PDNN(7, n_pdfs)(inp)
z = L.Lambda(lambda x: tf.math.pow(x,2))(z)
z = L.Lambda(lambda x: tf.math.top_k(x,k=5)[1]/10)(z)
z = L.Flatten()(z)
z = L.Dense(64, activation='relu')(z)

w = PDNN(7, n_pdfs)(inp)
w = L.Lambda(lambda x: tf.math.pow(x,2))(w)
w = L.Lambda(lambda x: tf.math.top_k(x,k=5)[1]/10)(w)
w = L.Flatten()(w)
w = L.Dense(64, activation='relu')(w)

m = PDNN(7, n_pdfs)(inp)
m = L.Lambda(lambda x: tf.math.pow(x,2))(m)
m = L.Lambda(lambda x: tf.math.top_k(x,k=5)[1]/10)(m)
m = L.Flatten()(m)
m = L.Dense(64, activation='relu')(m)

n = PDNN(7, n_pdfs)(inp)
n = L.Lambda(lambda x: tf.math.pow(x,2))(n)
n = L.Lambda(lambda x: tf.math.top_k(x,k=5)[1]/10)(n)
n = L.Flatten()(n)
n = L.Dense(64, activation='relu')(n)

x = L.concatenate([x,y,z,w, m, n])
x = L.Dense(64, activation='relu')(x)
x = L.Dense(32, activation='relu')(x)
out = L.Dense(1)(x)
model = Model(inp, out)
model.compile(optimizer=Adam(learning_rate=0.001), loss='mae')

This model looks like this:

Now let’s try to train our model on a basic time-series dataset and see how it compares to a classic statistical model such as ARIMA. The model arhitecture, as well as the data it was trained on can be found in this notebook. Before checking the resutlts we can take a peak into the wave function generated by the PDNN layers.

Looks great!

After 100 epochs, the model managed to get a RMSE score of 12.07. Let’s take a look at the prediction.

After fitting the ARIMA model, we got a RMSE of 14.96! That means our model outperformed the classical model by a significant amount.

Conclusion

The PDNN layer seems to have considerable potential in improving the performance of neural networks as it considers multiple possible outcomes at once instead of having to choose a specific one. This idea can be backed by its results compared to the ARIMA model in our time-series forecast example.

Don’t forget to give us your 👏 !