Skip to content

Fisher Information

Formula

\[ \mathcal{I}(\theta) = \mathbb{E}\left[\left(\frac{\partial}{\partial\theta} \log p(X;\theta)\right)^2\right] \]

Parameters

  • \(\theta\): parameter
  • \(p(X;\theta)\): likelihood
  • Expectation is over \(X\sim p(\cdot;\theta)\)

What it means

Measures how sensitive the likelihood is to changes in \(\theta\).

What it's used for

  • Lower-bounding estimator variance (Cramer-Rao).
  • Sensitivity of likelihood to parameter changes.

Key properties

  • \(\mathcal{I}(\theta) = -\mathbb{E}[\partial^2_{\theta} \log p(X;\theta)]\) under regularity
  • Cramer-Rao: \(\operatorname{Var}(\hat\theta) \ge 1/\mathcal{I}(\theta)\)

Common gotchas

  • Regularity conditions are required for the second-derivative form.
  • For vector \(\theta\), Fisher information is a matrix.

Example

For \(X\sim\mathcal{N}(\mu,\sigma^2)\) with known \(\sigma\), \(I(\mu)=1/\sigma^2\).

How to Compute (Pseudocode)

Input: likelihood model p(x; theta) and parameter value theta (or sample-based estimator setup)
Output: Fisher information I(theta)

compute the score function s(x; theta) <- d/dtheta log p(x; theta)
compute I(theta) = E[s(X; theta)^2] under the model
# in practice, evaluate the expectation analytically or approximate it numerically / by Monte Carlo
return I(theta)

Complexity

  • Time: Depends on whether the expectation is analytic or estimated numerically; sample-based estimation is typically linear in the number of samples used times score-evaluation cost
  • Space: Depends on the estimation method; often \(O(1)\) to \(O(n)\) beyond model parameters/sample storage
  • Assumptions: Scalar-parameter form shown; vector parameters use the Fisher information matrix and correspondingly larger computation/storage

See also