Skip to content

R^2 (Coefficient of Determination)

Formula

\[ R^2 = 1 - \frac{\sum_i (y_i-\hat y_i)^2}{\sum_i (y_i-\bar y)^2} \]

Parameters

  • \(y_i\): true value
  • \(\hat y_i\): prediction
  • \(\bar y\): sample mean of \(y_i\)

What it means

Fraction of variance in \(y\) explained by the model.

What it's used for

  • Regression model comparison on the same dataset.
  • Quick diagnostic for goodness of fit.

Key properties

  • \(R^2=1\) is perfect fit.
  • Can be negative if the model is worse than predicting \(\bar y\).

Common gotchas

  • Not comparable across different datasets.
  • Can increase with more features even if they are not useful.

Example

If \(\sum (y_i-\hat y_i)^2=20\) and \(\sum (y_i-\bar y)^2=50\), then \(R^2=1-20/50=0.6\).

How to Compute (Pseudocode)

Input: true values y[1..n], predictions y_hat[1..n]
Output: R^2

y_bar <- mean(y)
ss_res <- sum_i (y[i] - y_hat[i])^2
ss_tot <- sum_i (y[i] - y_bar)^2
if ss_tot == 0:
  return undefined (or a library-specific convention)
return 1 - ss_res / ss_tot

Complexity

  • Time: \(O(n)\)
  • Space: \(O(1)\) extra space
  • Assumptions: \(n\) paired predictions/targets; prediction-generation cost is excluded