Precision-Recall Curve¶

Formula¶

\[ \mathcal{C} = \{(\mathrm{Precision}(t),\,\mathrm{Recall}(t)) : t \in \mathbb{R}\} \]

Plot¶

fn: 1/(1+2*x)
xmin: 0
xmax: 1
ymin: 0.3
ymax: 1.05
height: 280
title: Example precision-recall curve (illustrative)

Parameters¶

\(t\): decision threshold

What it means¶

Tradeoff between precision and recall across thresholds.

What it's used for¶

Visualizing precision vs recall across thresholds.
Choosing operating points for imbalanced data.

Key properties¶

More informative than ROC for heavy class imbalance
Area under PR curve equals average precision for common definitions

Common gotchas¶

Baseline is the positive class prevalence, not 0.5.
Different interpolation conventions change AP.

Example¶

Two thresholds might give points \((\mathrm{Recall},\mathrm{Precision})=(0.9,0.6)\) and \((0.6,0.85)\).

How to Compute (Pseudocode)¶

Input: scores p_hat[1..n], labels y[1..n]
Output: PR curve points

sort examples by score descending
sweep a threshold from high to low through unique scores
at each threshold, update confusion-matrix counts incrementally
record the corresponding curve point (TPR/FPR for ROC or Precision/Recall for PR)
return all curve points

Complexity¶

Time: \(O(n\log n)\) due to sorting, plus a linear threshold sweep
Space: \(O(n)\) for sorted scores/labels and output curve points
Assumptions: Binary ranking scores; ties and interpolation conventions depend on the implementation