Precision-Recall Curve¶
Formula¶
\[
\mathcal{C} = \{(\mathrm{Precision}(t),\,\mathrm{Recall}(t)) : t \in \mathbb{R}\}
\]
Plot¶
fn: 1/(1+2*x)
xmin: 0
xmax: 1
ymin: 0.3
ymax: 1.05
height: 280
title: Example precision-recall curve (illustrative)
Parameters¶
- \(t\): decision threshold
What it means¶
Tradeoff between precision and recall across thresholds.
What it's used for¶
- Visualizing precision vs recall across thresholds.
- Choosing operating points for imbalanced data.
Key properties¶
- More informative than ROC for heavy class imbalance
- Area under PR curve equals average precision for common definitions
Common gotchas¶
- Baseline is the positive class prevalence, not 0.5.
- Different interpolation conventions change AP.
Example¶
Two thresholds might give points \((\mathrm{Recall},\mathrm{Precision})=(0.9,0.6)\) and \((0.6,0.85)\).
How to Compute (Pseudocode)¶
Input: scores p_hat[1..n], labels y[1..n]
Output: PR curve points
sort examples by score descending
sweep a threshold from high to low through unique scores
at each threshold, update confusion-matrix counts incrementally
record the corresponding curve point (TPR/FPR for ROC or Precision/Recall for PR)
return all curve points
Complexity¶
- Time: \(O(n\log n)\) due to sorting, plus a linear threshold sweep
- Space: \(O(n)\) for sorted scores/labels and output curve points
- Assumptions: Binary ranking scores; ties and interpolation conventions depend on the implementation