Fisher Information¶

Formula¶

\[ \mathcal{I}(\theta) = \mathbb{E}\left[\left(\frac{\partial}{\partial\theta} \log p(X;\theta)\right)^2\right] \]

Parameters¶

\(\theta\): parameter
\(p(X;\theta)\): likelihood
Expectation is over \(X\sim p(\cdot;\theta)\)

What it means¶

Measures how sensitive the likelihood is to changes in \(\theta\).

What it's used for¶

Lower-bounding estimator variance (Cramer-Rao).
Sensitivity of likelihood to parameter changes.

Key properties¶

\(\mathcal{I}(\theta) = -\mathbb{E}[\partial^2_{\theta} \log p(X;\theta)]\) under regularity
Cramer-Rao: \(\operatorname{Var}(\hat\theta) \ge 1/\mathcal{I}(\theta)\)

Common gotchas¶

Regularity conditions are required for the second-derivative form.
For vector \(\theta\), Fisher information is a matrix.

Example¶

For \(X\sim\mathcal{N}(\mu,\sigma^2)\) with known \(\sigma\), \(I(\mu)=1/\sigma^2\).

How to Compute (Pseudocode)¶

Input: likelihood model p(x; theta) and parameter value theta (or sample-based estimator setup)
Output: Fisher information I(theta)

compute the score function s(x; theta) <- d/dtheta log p(x; theta)
compute I(theta) = E[s(X; theta)^2] under the model
# in practice, evaluate the expectation analytically or approximate it numerically / by Monte Carlo
return I(theta)

Complexity¶

Time: Depends on whether the expectation is analytic or estimated numerically; sample-based estimation is typically linear in the number of samples used times score-evaluation cost
Space: Depends on the estimation method; often \(O(1)\) to \(O(n)\) beyond model parameters/sample storage
Assumptions: Scalar-parameter form shown; vector parameters use the Fisher information matrix and correspondingly larger computation/storage