Bias-Variance Tradeoff¶
Formula¶
\[
\mathbb{E}\big[(\hat f(x)-f(x))^2\big] = \mathrm{Bias}[\hat f(x)]^2 + \mathrm{Var}[\hat f(x)] + \sigma^2
\]
Plot¶
fns: 0.75*exp(-0.45*x)+0.08 | 0.03*(x^2)+0.05 | 0.75*exp(-0.45*x)+0.03*(x^2)+0.2
colors: #1f6feb | #ff6b2c | #111111
labels: Bias^2 | Variance | Total error
xmin: 0
xmax: 8
ymin: 0
ymax: 1.25
height: 300
title: Bias-variance tradeoff (illustrative)
Parameters¶
- \(\hat f(x)\): learned predictor
- \(f(x)\): true signal
- \(\sigma^2\): irreducible noise
What it means¶
Prediction error can be decomposed into systematic error (bias), estimation variability (variance), and noise.
What it's used for¶
- Reasoning about model complexity and regularization.
- Choosing simpler vs more flexible models.
Key properties¶
- Increasing complexity often lowers bias but raises variance.
- Regularization usually increases bias and decreases variance.
Common gotchas¶
- Treat the decomposition as a conceptual guide; exact assumptions matter.
- Training error alone cannot reveal the tradeoff.
Example¶
A deep tree may fit training data perfectly (low bias) but vary a lot across resamples (high variance).
How to Compute (Pseudocode)¶
Input: model family and a resampling/evaluation procedure
Output: qualitative or empirical bias/variance assessment
fit models of varying complexity (or regularization strength)
evaluate training and validation/test error across resamples/folds
observe patterns:
high bias -> both errors high
high variance -> train error low, validation error much higher / unstable
choose a complexity/regularization level balancing the tradeoff
Complexity¶
- Time: Dominated by repeated model fitting/evaluation across model settings and resamples (workflow-dependent)
- Space: Depends on stored models, predictions, and resampling results
- Assumptions: This is an empirical model-selection workflow rather than direct computation of the theoretical decomposition terms