Early Stopping¶

Formula¶

\[ t^*=\arg\min_t \ \text{ValLoss}(t) \]

Parameters¶

\(\text{ValLoss}(t)\): validation loss (or metric-based score) at step/epoch \(t\)
\(t^*\): selected stopping time

What it means¶

Stop training when validation performance stops improving, instead of continuing to minimize training loss.

What it's used for¶

Preventing overfitting.
Saving training time and compute.

Key properties¶

Uses a held-out validation set.
Often implemented with patience and best-checkpoint restoration.

Common gotchas¶

Validation metric noise can trigger premature stopping.
Using the test set for stopping leaks information.

Example¶

If validation loss does not improve for 5 epochs, stop and restore the best checkpoint.

How to Compute (Pseudocode)¶

Input: training loop, validation metric/loss, patience P
Output: stopping step and best checkpoint

best_score <- worst_possible
best_step <- 0
patience_counter <- 0
for each training epoch/step t:
  train for one step/epoch
  evaluate on validation set
  if validation improves:
    save checkpoint as best
    best_score <- current validation score
    best_step <- t
    patience_counter <- 0
  else:
    patience_counter <- patience_counter + 1
    if patience_counter >= P:
      stop training and restore best checkpoint
return best_step

Complexity¶

Time: Adds repeated validation evaluations during training; total cost depends on validation frequency and evaluation cost (training usually still dominates)
Space: Requires storing at least one best-checkpoint copy (model-size dependent) plus validation metrics/history bookkeeping
Assumptions: Patience-based early stopping shown; checkpointing strategy and validation cadence determine practical overhead