ROC Curve¶
Formula¶
\[
\mathcal{C} = \{(\mathrm{FPR}(t),\,\mathrm{TPR}(t)) : t \in \mathbb{R}\}
\]
Plot¶
fns: x | sqrt(x)
colors: #111111 | #ff6b2c
labels: Random baseline | Example model
xmin: 0
xmax: 1
ymin: 0
ymax: 1.05
height: 280
title: ROC curve with random baseline (illustrative)
Parameters¶
- \(\mathrm{TPR} = \mathrm{Recall}\)
- \(\mathrm{FPR} = \mathrm{FP}/(\mathrm{FP}+\mathrm{TN})\)
- \(t\): decision threshold
What it means¶
Tradeoff between true-positive rate and false-positive rate across thresholds.
What it's used for¶
- Visualizing TPR vs FPR across thresholds.
- Comparing classifiers independent of a threshold.
Key properties¶
- AUC-ROC is threshold-invariant
- Equivalent to ranking performance under mild assumptions
Common gotchas¶
- Can look overly optimistic under extreme class imbalance.
- Uses \(\mathrm{TN}\), unlike PR curves.
Example¶
Two thresholds might give points \((\mathrm{FPR},\mathrm{TPR})=(0.1,0.8)\) and \((0.3,0.9)\).
How to Compute (Pseudocode)¶
Input: scores p_hat[1..n], labels y[1..n]
Output: ROC curve points
sort examples by score descending
sweep a threshold from high to low through unique scores
at each threshold, update confusion-matrix counts incrementally
record the corresponding curve point (TPR/FPR for ROC or Precision/Recall for PR)
return all curve points
Complexity¶
- Time: \(O(n\log n)\) due to sorting, plus a linear threshold sweep
- Space: \(O(n)\) for sorted scores/labels and output curve points
- Assumptions: Binary ranking scores; ties and interpolation conventions depend on the implementation