Skip to content

Multivariable Chain Rule

Formula

\[ J_{f\circ g}(x)=J_f(g(x))\,J_g(x) \]

Parameters

  • \(g:\mathbb{R}^n\to\mathbb{R}^m\)
  • \(f:\mathbb{R}^m\to\mathbb{R}^p\)
  • \(J_f,J_g\): Jacobians

What it means

The Jacobian of a composition is the matrix product of Jacobians in the correct order.

What it's used for

  • Backpropagation and autodiff.
  • Differentiating layered vector-valued systems.

Key properties

  • Matrix dimensions determine the valid multiplication order.
  • Scalar chain rule is a special case.

Common gotchas

  • Row/column convention mismatches can transpose the formula.
  • Composition order and multiplication order are easy to reverse.

Example

For neural nets, gradients propagate through layers by repeated Jacobian products (implicitly via vector-Jacobian products).

How to Compute (Pseudocode)

Input: functions g: R^n -> R^m, f: R^m -> R^p, point x
Output: Jacobian of f o g at x

y <- g(x)
Jg <- Jacobian of g at x
Jf <- Jacobian of f at y
return Jf * Jg

Complexity

  • Time: Depends on the cost of computing the Jacobians and multiplying them (matrix multiply is \(O(pmn)\) in the dense naive case once Jacobians are available)
  • Space: Depends on whether full Jacobians are materialized (up to \(O(pm + mn + pn)\))
  • Assumptions: Full-Jacobian workflow shown; autodiff often uses Jacobian-vector/vector-Jacobian products instead of explicit Jacobian matrices

See also