Mean Squared Error (MSE)¶
Formula¶
\[
\mathrm{MSE} = \frac{1}{n}\sum_{i=1}^n (y_i-\hat y_i)^2
\]
Plot¶
fn: x^2
xmin: -3
xmax: 3
ymin: 0
ymax: 9.5
height: 280
title: Squared error vs residual r
Parameters¶
- \(y_i\): true value
- \(\hat y_i\): prediction
- \(n\): number of samples
What it means¶
Average squared prediction error.
What it's used for¶
- Regression model evaluation and training loss.
- Penalizing large errors more than small ones.
Key properties¶
- Nonnegative; 0 means perfect prediction.
- Sensitive to outliers due to squaring.
Common gotchas¶
- Units are squared; compare carefully across scales.
- Not robust to heavy-tailed noise.
Example¶
If errors are \([-1, 2, 0]\), \(\mathrm{MSE}=(1+4+0)/3=1.667\).
How to Compute (Pseudocode)¶
Input: true values y[1..n], predictions y_hat[1..n]
Output: mse
sum_sq <- 0
for i from 1 to n:
residual <- y[i] - y_hat[i]
sum_sq <- sum_sq + residual^2
mse <- sum_sq / n
return mse
Complexity¶
- Time: \(O(n)\)
- Space: \(O(1)\) additional space
- Assumptions: \(n\) is the number of paired predictions/targets