Bayesian Information Criterion (BIC)¶
Formula¶
\[
\mathrm{BIC} = k\log n - 2\log \hat{L}
\]
Parameters¶
- \(k\): number of fitted parameters
- \(n\): number of observations
- \(\hat{L}\): maximized likelihood of the model
What it means¶
Model-selection criterion that trades off likelihood fit with a stronger complexity penalty than AIC as sample size grows.
What it's used for¶
- Selecting parsimonious probabilistic models on the same dataset.
- Common for Gaussian mixture model component selection.
Key properties¶
- Lower is better
- Penalizes complexity by \(k\log n\), which grows with sample size
- Often favors simpler models than AIC
Common gotchas¶
- Like AIC, only compare models fit to the same data and likelihood setup.
- Can underfit when predictive performance is the main goal.
Example¶
If \(k=5\), \(n=100\), and \(\log \hat{L}=-120\), then \(\mathrm{BIC}=5\log(100)-2(-120)\approx 263.0\).
How to Compute (Pseudocode)¶
Input: fitted model log-likelihood logL, parameter count k, sample size n
Output: BIC
compute BIC from logL and the penalty term in the card formula
return the score
Complexity¶
- Time: \(O(1)\) once log-likelihood and model metadata are available
- Space: \(O(1)\)
- Assumptions: The cost of fitting the model and evaluating the likelihood is excluded