Skip to content

Data Science Field Guide

ROC AUC

ROC AUC¶

Formula¶

\[ \mathrm{AUC} = P(s(x^+) > s(x^-)) + \tfrac{1}{2}P(s(x^+)=s(x^-)) \]

Plot¶

fns: x | sqrt(x)
colors: #111111 | #ff6b2c
labels: Random baseline | Example model
xmin: 0
xmax: 1
ymin: 0
ymax: 1.05
height: 280
title: AUC intuition from area under ROC (illustrative)

Parameters¶

\(s(\cdot)\): score function
\(x^+\): positive example
\(x^-\): negative example

What it means¶

AUC is the area under the ROC curve (TPR vs FPR) as you sweep a score threshold.

What it's used for¶

Measuring ranking quality of binary classifiers.
Comparing models independent of a threshold.

Key properties¶

Threshold-independent ranking metric
Invariant to any strictly monotone transform of scores

Common gotchas¶

AUC can look “good” under extreme class imbalance even when precision is poor
If you care about top-of-list performance, consider PR AUC / precision@k

Example¶

If all positives are scored above all negatives, AUC = 1.0.

How to Compute (Pseudocode)¶

Input: ROC curve points (FPR, TPR) sorted by FPR
Output: ROC AUC

auc <- 0
for each consecutive pair of ROC points:
  auc <- auc + trapezoid_area_between_points
return auc

Complexity¶

Time: \(O(L)\) once \(L\) ROC curve points are available
Space: \(O(1)\) extra space (excluding stored curve points)
Assumptions: If starting from raw scores, ROC construction typically dominates at \(O(n\log n)\)