Skip to content

Log Loss (Binary Cross-Entropy)

Formula

\[ \mathrm{LogLoss} = -\frac{1}{n}\sum_{i=1}^n \left[y_i\log(\hat p_i) + (1-y_i)\log(1-\hat p_i)\right] \]

Plot

fn: -log(x)
xmin: 0.001
xmax: 0.999
ymin: 0
ymax: 7
height: 280
title: Binary log loss for y=1 vs predicted p

Parameters

  • \(y_i\in\{0,1\}\): true label
  • \(\hat p_i = P(y_i=1)\): predicted probability for class 1

What it means

Penalizes confident wrong predictions heavily; corresponds to Bernoulli negative log-likelihood.

What it's used for

  • Training and evaluating probabilistic classifiers.
  • Penalizing confident wrong predictions.

Key properties

  • Proper scoring rule: minimized in expectation by predicting the true probability
  • Unbounded above (if you predict 0 for a true 1 → \(-\log 0\))

Common gotchas

  • Clip probabilities: \(\hat p \leftarrow \mathrm{clip}(\hat p, \epsilon, 1-\epsilon)\)
  • Don’t confuse with accuracy: log loss cares about calibration

Example

For \(y=1\) and \(p=0.8\), \(-\log p=-\log 0.8\).

How to Compute (Pseudocode)

Input: predicted probabilities (or model likelihoods) and true labels/observations
Output: log loss

accumulate the negative log probability assigned to the observed outcomes
average over examples if reporting mean loss
return the aggregated value

Complexity

  • Time: \(O(n)\) once per-example predicted probabilities/likelihood terms are available
  • Space: \(O(1)\) extra space for running accumulation
  • Assumptions: Exact formula depends on binary vs multiclass vs sequence likelihood setup; model forward-pass cost is excluded