Multivariable Chain Rule¶

Formula¶

\[ J_{f\circ g}(x)=J_f(g(x))\,J_g(x) \]

Parameters¶

\(g:\mathbb{R}^n\to\mathbb{R}^m\)
\(f:\mathbb{R}^m\to\mathbb{R}^p\)
\(J_f,J_g\): Jacobians

What it means¶

The Jacobian of a composition is the matrix product of Jacobians in the correct order.

What it's used for¶

Backpropagation and autodiff.
Differentiating layered vector-valued systems.

Key properties¶

Matrix dimensions determine the valid multiplication order.
Scalar chain rule is a special case.

Common gotchas¶

Row/column convention mismatches can transpose the formula.
Composition order and multiplication order are easy to reverse.

Example¶

For neural nets, gradients propagate through layers by repeated Jacobian products (implicitly via vector-Jacobian products).

How to Compute (Pseudocode)¶

Input: functions g: R^n -> R^m, f: R^m -> R^p, point x
Output: Jacobian of f o g at x

y <- g(x)
Jg <- Jacobian of g at x
Jf <- Jacobian of f at y
return Jf * Jg

Complexity¶

Time: Depends on the cost of computing the Jacobians and multiplying them (matrix multiply is \(O(pmn)\) in the dense naive case once Jacobians are available)
Space: Depends on whether full Jacobians are materialized (up to \(O(pm + mn + pn)\))
Assumptions: Full-Jacobian workflow shown; autodiff often uses Jacobian-vector/vector-Jacobian products instead of explicit Jacobian matrices