Skip to content

Bayesian Information Criterion (BIC)

Formula

\[ \mathrm{BIC} = k\log n - 2\log \hat{L} \]

Parameters

  • \(k\): number of fitted parameters
  • \(n\): number of observations
  • \(\hat{L}\): maximized likelihood of the model

What it means

Model-selection criterion that trades off likelihood fit with a stronger complexity penalty than AIC as sample size grows.

What it's used for

  • Selecting parsimonious probabilistic models on the same dataset.
  • Common for Gaussian mixture model component selection.

Key properties

  • Lower is better
  • Penalizes complexity by \(k\log n\), which grows with sample size
  • Often favors simpler models than AIC

Common gotchas

  • Like AIC, only compare models fit to the same data and likelihood setup.
  • Can underfit when predictive performance is the main goal.

Example

If \(k=5\), \(n=100\), and \(\log \hat{L}=-120\), then \(\mathrm{BIC}=5\log(100)-2(-120)\approx 263.0\).

How to Compute (Pseudocode)

Input: fitted model log-likelihood logL, parameter count k, sample size n
Output: BIC

compute BIC from logL and the penalty term in the card formula
return the score

Complexity

  • Time: \(O(1)\) once log-likelihood and model metadata are available
  • Space: \(O(1)\)
  • Assumptions: The cost of fitting the model and evaluating the likelihood is excluded

See also