R^2 (Coefficient of Determination)¶
Formula¶
\[
R^2 = 1 - \frac{\sum_i (y_i-\hat y_i)^2}{\sum_i (y_i-\bar y)^2}
\]
Parameters¶
- \(y_i\): true value
- \(\hat y_i\): prediction
- \(\bar y\): sample mean of \(y_i\)
What it means¶
Fraction of variance in \(y\) explained by the model.
What it's used for¶
- Regression model comparison on the same dataset.
- Quick diagnostic for goodness of fit.
Key properties¶
- \(R^2=1\) is perfect fit.
- Can be negative if the model is worse than predicting \(\bar y\).
Common gotchas¶
- Not comparable across different datasets.
- Can increase with more features even if they are not useful.
Example¶
If \(\sum (y_i-\hat y_i)^2=20\) and \(\sum (y_i-\bar y)^2=50\), then \(R^2=1-20/50=0.6\).
How to Compute (Pseudocode)¶
Input: true values y[1..n], predictions y_hat[1..n]
Output: R^2
y_bar <- mean(y)
ss_res <- sum_i (y[i] - y_hat[i])^2
ss_tot <- sum_i (y[i] - y_bar)^2
if ss_tot == 0:
return undefined (or a library-specific convention)
return 1 - ss_res / ss_tot
Complexity¶
- Time: \(O(n)\)
- Space: \(O(1)\) extra space
- Assumptions: \(n\) paired predictions/targets; prediction-generation cost is excluded