Neural networks 101
Epoch 2
In the previous article we saw the basic mathematical representation of neurons. In this article we shall derive the equations for gradient descent. Let’s get started.
The Equations
We have the following equations:
For Forward Pass
For Back Propagation
During forward pass, the following steps take place:
- The weight is multiplied with the input value (i.e.
x
) and is then added with the biasb
. The point to be noted here is that the weights and the input values are vectors of a certain dimension, sayn
. Hence it is required to take the transpose of the weight matrix to ensure component wise multiplication. (Alternatively, the operation can also be seen as a dot product ofw
andx
) - The sigmoid function is applied to the calculated value to ensure that the value that is returned by the neuron lies between 0 and 1. This output is taken as the predicted value.
- The loss function calculates the error in the predicted value and the actual label related to the particular training example.
During back propagation, the following steps take place:
- The derivatives
dw
anddb
are calculated for each dimension of the vector using the first two equations. - Then the weights and biases are updated according to the third and fourth equation. The
lambda
variable in the third and fourth equation is the learning rate. This learning rate is used to define how big of a ‘step’ the model will take in the direction of the decreasing slope.
This is just:
- One step of gradient descent
- On one training example
- Using only one neuron
Yupp, it already looks complex in its most basic form.
Deriving the partial derivatives
The back propagation step involves the calculation of partial derivatives. The derivatives that are to be calculated requires the application chain rule in the reverse order (hence the name back propagation). Applying chain rule to the partial derivatives that appear in equations mentioned above we get:
On solving, we get:
Similarly, for the bias
On solving, we get:
Now since,
We can rewrite the above equations as:
Where,
Now that we have the algorithm and successfully derived the equations for gradient descent, we can now develop a simple python code to perform a single step of gradient descent. However, since these equations calculate the gradient for just one training example, we have to use another function to essentially ‘average out’ the gradients of all the training examples. This is where the cost function is used.
Cost Function
The cost function is defined as the average of the loss function over all the training examples
Now using the cost function adds only one additional step to the existing algorithm, we need to average the gradients calculated for each training example.
Now with this we have all the necessary equations to be able to implement one step of gradient descent.
Python pseudocode
Using our knowledge on back propagation, we can now write a simple pseudocode to perform gradient descent.
for i in range(0,no_of_epochs):
z=np.dot(w.T,X)
y=sigmoid(z)
dz=y-Y
dw=1/m*(np.sum(np.dot(dz.T,X)))
db=1/m*(np.sum(dz))
w=w-a*dw
b=b-a*db
With each epoch, the parameters w
and b
are adjusted (or learned) by performing the gradient descent with a step size of a
units. If we were to calculate the loss function at each epoch, we will notice that the loss value will be reduced in each epoch. Hence, as we increase the number of epochs, the model parameters are adjusted for a larger number of times.
And that’s it! In just 8 lines of code we have implemented gradient descent algorithm in python (However, it is pretty much useless as it deals with only one neuron).
Conclusion
With this, We have reached the end of the second article. In this article we have seen the how the partial derivatives for Gradient Descent are derived. We also extended our algorithm to be able to accommodate more than just one training example. In the upcoming articles, we will continue to improve upon our python code and extend it to handle more complex neural networks.
Until next time…
About Preetham
Hi, I'm Preetham, a student pursuing a Bachelor's degree in Artificial Intelligence and Machine Learning. As an aspiring data scientist, I'm passionate about exploring the cutting-edge of machine learning and AI research. I believe that the most effective way to learn is by teaching others, which is why I decided to start this blog series.