Fisher Information¶
Formula¶
\[
\mathcal{I}(\theta) = \mathbb{E}\left[\left(\frac{\partial}{\partial\theta} \log p(X;\theta)\right)^2\right]
\]
Parameters¶
- \(\theta\): parameter
- \(p(X;\theta)\): likelihood
- Expectation is over \(X\sim p(\cdot;\theta)\)
What it means¶
Measures how sensitive the likelihood is to changes in \(\theta\).
What it's used for¶
- Lower-bounding estimator variance (Cramer-Rao).
- Sensitivity of likelihood to parameter changes.
Key properties¶
- \(\mathcal{I}(\theta) = -\mathbb{E}[\partial^2_{\theta} \log p(X;\theta)]\) under regularity
- Cramer-Rao: \(\operatorname{Var}(\hat\theta) \ge 1/\mathcal{I}(\theta)\)
Common gotchas¶
- Regularity conditions are required for the second-derivative form.
- For vector \(\theta\), Fisher information is a matrix.
Example¶
For \(X\sim\mathcal{N}(\mu,\sigma^2)\) with known \(\sigma\), \(I(\mu)=1/\sigma^2\).
How to Compute (Pseudocode)¶
Input: likelihood model p(x; theta) and parameter value theta (or sample-based estimator setup)
Output: Fisher information I(theta)
compute the score function s(x; theta) <- d/dtheta log p(x; theta)
compute I(theta) = E[s(X; theta)^2] under the model
# in practice, evaluate the expectation analytically or approximate it numerically / by Monte Carlo
return I(theta)
Complexity¶
- Time: Depends on whether the expectation is analytic or estimated numerically; sample-based estimation is typically linear in the number of samples used times score-evaluation cost
- Space: Depends on the estimation method; often \(O(1)\) to \(O(n)\) beyond model parameters/sample storage
- Assumptions: Scalar-parameter form shown; vector parameters use the Fisher information matrix and correspondingly larger computation/storage