Skip to content

Data Science Field Guide

R^2 (Coefficient of Determination)

R^2 (Coefficient of Determination)¶

Formula¶

\[ R^2 = 1 - \frac{\sum_i (y_i-\hat y_i)^2}{\sum_i (y_i-\bar y)^2} \]

Parameters¶

\(y_i\): true value
\(\hat y_i\): prediction
\(\bar y\): sample mean of \(y_i\)

What it means¶

Fraction of variance in \(y\) explained by the model.

What it's used for¶

Regression model comparison on the same dataset.
Quick diagnostic for goodness of fit.

Key properties¶

\(R^2=1\) is perfect fit.
Can be negative if the model is worse than predicting \(\bar y\).

Common gotchas¶

Not comparable across different datasets.
Can increase with more features even if they are not useful.

Example¶

If \(\sum (y_i-\hat y_i)^2=20\) and \(\sum (y_i-\bar y)^2=50\), then \(R^2=1-20/50=0.6\).

How to Compute (Pseudocode)¶

Input: true values y[1..n], predictions y_hat[1..n]
Output: R^2

y_bar <- mean(y)
ss_res <- sum_i (y[i] - y_hat[i])^2
ss_tot <- sum_i (y[i] - y_bar)^2
if ss_tot == 0:
  return undefined (or a library-specific convention)
return 1 - ss_res / ss_tot

Complexity¶

Time: \(O(n)\)
Space: \(O(1)\) extra space
Assumptions: \(n\) paired predictions/targets; prediction-generation cost is excluded