Skip to content

Threshold Selection

Formula

\[ \hat y = \mathbf{1}[\hat p \ge t] \]

Plot

fn: 1/(1+exp(-30*(x-0.5)))
xmin: 0
xmax: 1
ymin: -0.05
ymax: 1.05
height: 280
title: Thresholding behavior (smooth step approximation)

Parameters

  • \(\hat p\): predicted score/probability
  • \(t\): decision threshold

What it means

Converts scores or probabilities into decisions by choosing a cutoff aligned with business costs and constraints.

What it's used for

  • Balancing precision/recall or sensitivity/specificity.
  • Operating-point selection under cost or capacity limits.

Key properties

  • The best threshold depends on objective, prevalence, and calibration.
  • Threshold should often be selected on validation data, not test data.

Common gotchas

  • Defaulting to 0.5 is rarely justified in imbalanced or cost-sensitive settings.
  • Thresholds can drift when class prevalence shifts.

Example

Choose a threshold that keeps false positives under a review-team capacity while maximizing recall.

How to Compute (Pseudocode)

Input: validation scores p_hat[1..n], labels y[1..n], objective/constraint
Output: selected threshold t*

candidate_thresholds <- sorted unique scores (or a grid)
best_t <- default_threshold
best_value <- -infinity

for each threshold t in candidate_thresholds:
  predictions <- 1[p_hat >= t]
  metrics <- compute validation metrics/objective
  if metrics satisfy constraints and improve objective:
    update best_t, best_value

return best_t

Complexity

  • Time: \(O(nL)\) for a straightforward scan over \(L\) candidate thresholds on \(n\) validation examples (can be improved with sorting/cumulative counts)
  • Space: \(O(n)\) for scores/labels and optional sorted copies
  • Assumptions: Threshold is chosen on validation data; \(L\) depends on whether a full unique-score scan or a coarse grid is used

See also