Threshold Selection¶

Formula¶

\[ \hat y = \mathbf{1}[\hat p \ge t] \]

Plot¶

fn: 1/(1+exp(-30*(x-0.5)))
xmin: 0
xmax: 1
ymin: -0.05
ymax: 1.05
height: 280
title: Thresholding behavior (smooth step approximation)

Parameters¶

\(\hat p\): predicted score/probability
\(t\): decision threshold

What it means¶

Converts scores or probabilities into decisions by choosing a cutoff aligned with business costs and constraints.

What it's used for¶

Balancing precision/recall or sensitivity/specificity.
Operating-point selection under cost or capacity limits.

Key properties¶

The best threshold depends on objective, prevalence, and calibration.
Threshold should often be selected on validation data, not test data.

Common gotchas¶

Defaulting to 0.5 is rarely justified in imbalanced or cost-sensitive settings.
Thresholds can drift when class prevalence shifts.

Example¶

Choose a threshold that keeps false positives under a review-team capacity while maximizing recall.

How to Compute (Pseudocode)¶

Input: validation scores p_hat[1..n], labels y[1..n], objective/constraint
Output: selected threshold t*

candidate_thresholds <- sorted unique scores (or a grid)
best_t <- default_threshold
best_value <- -infinity

for each threshold t in candidate_thresholds:
  predictions <- 1[p_hat >= t]
  metrics <- compute validation metrics/objective
  if metrics satisfy constraints and improve objective:
    update best_t, best_value

return best_t

Complexity¶

Time: \(O(nL)\) for a straightforward scan over \(L\) candidate thresholds on \(n\) validation examples (can be improved with sorting/cumulative counts)
Space: \(O(n)\) for scores/labels and optional sorted copies
Assumptions: Threshold is chosen on validation data; \(L\) depends on whether a full unique-score scan or a coarse grid is used