Skip to content

Gradient

Formula

\[ \nabla f(x)= \begin{bmatrix} \frac{\partial f}{\partial x_1} \\ \vdots \\ \frac{\partial f}{\partial x_n} \end{bmatrix} \]

Parameters

  • \(f:\mathbb{R}^n\to\mathbb{R}\): scalar-valued function
  • \(\nabla f(x)\): vector of partial derivatives

What it means

The gradient points in the direction of steepest increase of a scalar function.

What it's used for

  • Gradient descent and optimization.
  • Sensitivity analysis in multivariable models.

Key properties

  • \(-\nabla f(x)\) is the steepest descent direction locally.
  • Orthogonal to level sets (under regularity conditions).

Common gotchas

  • Row vs column vector conventions differ.
  • Gradient magnitude depends on feature scaling.

Example

For \(f(x,y)=x^2+y^2\), \(\nabla f=(2x,2y)\).

How to Compute (Pseudocode)

Input: scalar function f(x1, ..., xn), point x in R^n
Output: gradient vector grad

for j from 1 to n:
  grad[j] <- partial derivative of f with respect to x_j evaluated at x

return grad

Complexity

  • Time: \(O(n)\) partial-derivative evaluations at a high level
  • Space: \(O(n)\) to store the gradient vector
  • Assumptions: Excludes the internal cost of each partial derivative evaluation; automatic differentiation can change constants and practical cost

See also