Gradient¶
Formula¶
\[
\nabla f(x)=
\begin{bmatrix}
\frac{\partial f}{\partial x_1} \\
\vdots \\
\frac{\partial f}{\partial x_n}
\end{bmatrix}
\]
Parameters¶
- \(f:\mathbb{R}^n\to\mathbb{R}\): scalar-valued function
- \(\nabla f(x)\): vector of partial derivatives
What it means¶
The gradient points in the direction of steepest increase of a scalar function.
What it's used for¶
- Gradient descent and optimization.
- Sensitivity analysis in multivariable models.
Key properties¶
- \(-\nabla f(x)\) is the steepest descent direction locally.
- Orthogonal to level sets (under regularity conditions).
Common gotchas¶
- Row vs column vector conventions differ.
- Gradient magnitude depends on feature scaling.
Example¶
For \(f(x,y)=x^2+y^2\), \(\nabla f=(2x,2y)\).
How to Compute (Pseudocode)¶
Input: scalar function f(x1, ..., xn), point x in R^n
Output: gradient vector grad
for j from 1 to n:
grad[j] <- partial derivative of f with respect to x_j evaluated at x
return grad
Complexity¶
- Time: \(O(n)\) partial-derivative evaluations at a high level
- Space: \(O(n)\) to store the gradient vector
- Assumptions: Excludes the internal cost of each partial derivative evaluation; automatic differentiation can change constants and practical cost