Log Loss (Binary Cross-Entropy)¶
Formula¶
\[
\mathrm{LogLoss} = -\frac{1}{n}\sum_{i=1}^n \left[y_i\log(\hat p_i) + (1-y_i)\log(1-\hat p_i)\right]
\]
Plot¶
fn: -log(x)
xmin: 0.001
xmax: 0.999
ymin: 0
ymax: 7
height: 280
title: Binary log loss for y=1 vs predicted p
Parameters¶
- \(y_i\in\{0,1\}\): true label
- \(\hat p_i = P(y_i=1)\): predicted probability for class 1
What it means¶
Penalizes confident wrong predictions heavily; corresponds to Bernoulli negative log-likelihood.
What it's used for¶
- Training and evaluating probabilistic classifiers.
- Penalizing confident wrong predictions.
Key properties¶
- Proper scoring rule: minimized in expectation by predicting the true probability
- Unbounded above (if you predict 0 for a true 1 → \(-\log 0\))
Common gotchas¶
- Clip probabilities: \(\hat p \leftarrow \mathrm{clip}(\hat p, \epsilon, 1-\epsilon)\)
- Don’t confuse with accuracy: log loss cares about calibration
Example¶
For \(y=1\) and \(p=0.8\), \(-\log p=-\log 0.8\).
How to Compute (Pseudocode)¶
Input: predicted probabilities (or model likelihoods) and true labels/observations
Output: log loss
accumulate the negative log probability assigned to the observed outcomes
average over examples if reporting mean loss
return the aggregated value
Complexity¶
- Time: \(O(n)\) once per-example predicted probabilities/likelihood terms are available
- Space: \(O(1)\) extra space for running accumulation
- Assumptions: Exact formula depends on binary vs multiclass vs sequence likelihood setup; model forward-pass cost is excluded