Early Stopping¶
Formula¶
\[
t^*=\arg\min_t \ \text{ValLoss}(t)
\]
Parameters¶
- \(\text{ValLoss}(t)\): validation loss (or metric-based score) at step/epoch \(t\)
- \(t^*\): selected stopping time
What it means¶
Stop training when validation performance stops improving, instead of continuing to minimize training loss.
What it's used for¶
- Preventing overfitting.
- Saving training time and compute.
Key properties¶
- Uses a held-out validation set.
- Often implemented with patience and best-checkpoint restoration.
Common gotchas¶
- Validation metric noise can trigger premature stopping.
- Using the test set for stopping leaks information.
Example¶
If validation loss does not improve for 5 epochs, stop and restore the best checkpoint.
How to Compute (Pseudocode)¶
Input: training loop, validation metric/loss, patience P
Output: stopping step and best checkpoint
best_score <- worst_possible
best_step <- 0
patience_counter <- 0
for each training epoch/step t:
train for one step/epoch
evaluate on validation set
if validation improves:
save checkpoint as best
best_score <- current validation score
best_step <- t
patience_counter <- 0
else:
patience_counter <- patience_counter + 1
if patience_counter >= P:
stop training and restore best checkpoint
return best_step
Complexity¶
- Time: Adds repeated validation evaluations during training; total cost depends on validation frequency and evaluation cost (training usually still dominates)
- Space: Requires storing at least one best-checkpoint copy (model-size dependent) plus validation metrics/history bookkeeping
- Assumptions: Patience-based early stopping shown; checkpointing strategy and validation cadence determine practical overhead