In this article, I will tough upon the topic of Jacobian matrix and how it is used in the back-propagation operation of deep learning.
What is Jacobian Matrix?
Let’s consider the following function.
Now, the derivative of f1, f2 and f2 with respect to x1, x2 are:
If we arrange these derivatives in specific way, then we get our Jacobian matrix.
The jacobian matrix above is created using f1, f2, f3 and variable x1, x2. However, the number of f functions and x variable can be much higher.
Chain Rule of Multivariate Calculus
Here, partial derivatives are the jacobian matrices. Also, this chain rule generalizes to the arbitrary deep compositions. We can even find the partial derivate of f(g(h(x))) with respect to x (f, g, h are functions).
Let’s take an example,
we can solve this problem using chain rule i.e.,
Use of Jacobian in Neural Network
Let’s consider the following neural network:
The neural network training process consists of two main steps: forward propagation and backward propagation.
During backward propagation, we update the weight and bias values by calculating the partial derivative of the loss function with respect to each weight or bias. Instead of calculating these partial derivatives for each weight and bias separately, we use Jacobians. This approach increases the efficiency of the code used for training the neural network.
To update the weights using gradient descent during backpropagation, we calculate the partial derivative of the loss with respect to each corresponding weight. Specifically, for updating any weight in the W2 matrix, we find the partial derivative of the loss with respect to that particular weight.
Instead of calculating each gradient individually, we can use the Jacobian matrix to compute all the partial derivatives of the loss with respect to the weight values in the W2 matrix simultaneously.
Outro
Have a nice day!
コメント