Newton's Method¶
Formula¶
\[
x_{k+1} = x_k - \big(\nabla^2 f(x_k)\big)^{-1}\nabla f(x_k)
\]
Parameters¶
- \(\nabla^2 f\): Hessian matrix
- \(\nabla f\): gradient
What it means¶
Uses curvature information for faster local convergence.
What it's used for¶
- Fast local convergence for smooth optimization.
- Root-finding for nonlinear equations.
Key properties¶
- Quadratic convergence near a minimizer under regularity conditions
- Often combined with line search or damping
Common gotchas¶
- Hessian must be invertible; not guaranteed in nonconvex problems.
- Can be expensive for large-scale problems.
Example¶
For \(f(x)=x^2-2\), \(x_0=1\): \(x_1=1-(-1)/2=1.5\).
How to Compute (Pseudocode)¶
Input: objective f, gradient grad_f, Hessian hess_f, initial x0, iterations T
Output: approximate minimizer x
x <- x0
for k from 1 to T:
g <- grad_f(x)
H <- hess_f(x)
solve H * delta = g
x <- x - delta # optionally use damping/line search
return x
Complexity¶
- Time: Per iteration cost is dominated by gradient/Hessian evaluation and solving a linear system (dense solve often \(O(p^3)\) for \(p\) variables)
- Space: Often \(O(p^2)\) to store a dense Hessian plus \(O(p)\) vectors
- Assumptions: Dense full-Newton workflow shown; quasi-Newton and Hessian-vector methods have different costs and memory profiles