Skip to content

Bias-Variance Tradeoff

Formula

\[ \mathbb{E}\big[(\hat f(x)-f(x))^2\big] = \mathrm{Bias}[\hat f(x)]^2 + \mathrm{Var}[\hat f(x)] + \sigma^2 \]

Plot

fns: 0.75*exp(-0.45*x)+0.08 | 0.03*(x^2)+0.05 | 0.75*exp(-0.45*x)+0.03*(x^2)+0.2
colors: #1f6feb | #ff6b2c | #111111
labels: Bias^2 | Variance | Total error
xmin: 0
xmax: 8
ymin: 0
ymax: 1.25
height: 300
title: Bias-variance tradeoff (illustrative)

Parameters

  • \(\hat f(x)\): learned predictor
  • \(f(x)\): true signal
  • \(\sigma^2\): irreducible noise

What it means

Prediction error can be decomposed into systematic error (bias), estimation variability (variance), and noise.

What it's used for

  • Reasoning about model complexity and regularization.
  • Choosing simpler vs more flexible models.

Key properties

  • Increasing complexity often lowers bias but raises variance.
  • Regularization usually increases bias and decreases variance.

Common gotchas

  • Treat the decomposition as a conceptual guide; exact assumptions matter.
  • Training error alone cannot reveal the tradeoff.

Example

A deep tree may fit training data perfectly (low bias) but vary a lot across resamples (high variance).

How to Compute (Pseudocode)

Input: model family and a resampling/evaluation procedure
Output: qualitative or empirical bias/variance assessment

fit models of varying complexity (or regularization strength)
evaluate training and validation/test error across resamples/folds
observe patterns:
  high bias -> both errors high
  high variance -> train error low, validation error much higher / unstable
choose a complexity/regularization level balancing the tradeoff

Complexity

  • Time: Dominated by repeated model fitting/evaluation across model settings and resamples (workflow-dependent)
  • Space: Depends on stored models, predictions, and resampling results
  • Assumptions: This is an empirical model-selection workflow rather than direct computation of the theoretical decomposition terms

See also