Newton's Method¶

Formula¶

\[ x_{k+1} = x_k - \big(\nabla^2 f(x_k)\big)^{-1}\nabla f(x_k) \]

Parameters¶

\(\nabla^2 f\): Hessian matrix
\(\nabla f\): gradient

What it means¶

Uses curvature information for faster local convergence.

What it's used for¶

Fast local convergence for smooth optimization.
Root-finding for nonlinear equations.

Key properties¶

Quadratic convergence near a minimizer under regularity conditions
Often combined with line search or damping

Common gotchas¶

Hessian must be invertible; not guaranteed in nonconvex problems.
Can be expensive for large-scale problems.

Example¶

For \(f(x)=x^2-2\), \(x_0=1\): \(x_1=1-(-1)/2=1.5\).

How to Compute (Pseudocode)¶

Input: objective f, gradient grad_f, Hessian hess_f, initial x0, iterations T
Output: approximate minimizer x

x <- x0
for k from 1 to T:
  g <- grad_f(x)
  H <- hess_f(x)
  solve H * delta = g
  x <- x - delta        # optionally use damping/line search
return x

Complexity¶

Time: Per iteration cost is dominated by gradient/Hessian evaluation and solving a linear system (dense solve often \(O(p^3)\) for \(p\) variables)
Space: Often \(O(p^2)\) to store a dense Hessian plus \(O(p)\) vectors
Assumptions: Dense full-Newton workflow shown; quasi-Newton and Hessian-vector methods have different costs and memory profiles

Newton's Method¶

Formula¶

Parameters¶

What it means¶

What it's used for¶

Key properties¶

Common gotchas¶

Example¶

How to Compute (Pseudocode)¶

Complexity¶

See also¶