Extending micrograd: numpy acceleration

intuition: a gradient are all the partial derivatives of a scalar function arranged into a vector; Jacobians are all the partial derivatives of vector functions (gradients i.e., vectors) packed into matrices.

jacobian matrix:

The Jacobian matrix of the function is defined as

When applying chain rule during backpropagation, we multiply Jacobian matrices together to compute gradients for vector functions element-wise.

Forward broadcast = implicit copy

Backward unbroadcast = sum over copied axes

Video: PyTorch Autograd Explained - In-depth Tutorial

References

In-place operations & Multithreaded Autograd Example implementation of reverse-mode autodiff