Extending micrograd: numpy acceleration
intuition: a gradient are all the partial derivatives of a scalar function arranged into a vector; Jacobians are all the partial derivatives of vector functions (gradients i.e., vectors) packed into matrices.
jacobian matrix:
The Jacobian matrix of the function
When applying chain rule during backpropagation, we multiply Jacobian matrices together to compute gradients for vector functions element-wise.
Forward broadcast = implicit copy
Backward unbroadcast = sum over copied axes
Video: PyTorch Autograd Explained - In-depth Tutorial
References
Section titled “References”In-place operations & Multithreaded Autograd Example implementation of reverse-mode autodiff