Akaike Information Criterion (AIC)¶
Formula¶
\[
\mathrm{AIC} = 2k - 2\log \hat{L}
\]
Parameters¶
- \(k\): number of fitted parameters
- \(\hat{L}\): maximized likelihood of the model
What it means¶
Estimates out-of-sample predictive quality by balancing fit (likelihood) against model complexity.
What it's used for¶
- Comparing probabilistic models fit to the same dataset.
- Selecting model order (e.g., number of mixture components, ARIMA terms).
Key properties¶
- Lower is better
- Only meaningful for relative comparison across candidate models on the same data
- Penalizes complexity linearly in \(k\)
Common gotchas¶
- Absolute AIC values are not interpretable by themselves.
- Do not compare AIC across different datasets or different likelihood definitions.
- Small-sample correction (AICc) is often preferred when \(n\) is not large relative to \(k\).
Example¶
If \(k=5\) and \(\log \hat{L}=-120\), then \(\mathrm{AIC}=2(5)-2(-120)=250\).
How to Compute (Pseudocode)¶
Input: fitted model log-likelihood logL, parameter count k, sample size n
Output: AIC
compute AIC from logL and the penalty term in the card formula
return the score
Complexity¶
- Time: \(O(1)\) once log-likelihood and model metadata are available
- Space: \(O(1)\)
- Assumptions: The cost of fitting the model and evaluating the likelihood is excluded