Skip to content

Data Science Field Guide

Log Loss (Binary Cross-Entropy)

Log Loss (Binary Cross-Entropy)¶

Formula¶

\[ \mathrm{LogLoss} = -\frac{1}{n}\sum_{i=1}^n \left[y_i\log(\hat p_i) + (1-y_i)\log(1-\hat p_i)\right] \]

Plot¶

fn: -log(x)
xmin: 0.001
xmax: 0.999
ymin: 0
ymax: 7
height: 280
title: Binary log loss for y=1 vs predicted p

Parameters¶

\(y_i\in\{0,1\}\): true label
\(\hat p_i = P(y_i=1)\): predicted probability for class 1

What it means¶

Penalizes confident wrong predictions heavily; corresponds to Bernoulli negative log-likelihood.

What it's used for¶

Training and evaluating probabilistic classifiers.
Penalizing confident wrong predictions.

Key properties¶

Proper scoring rule: minimized in expectation by predicting the true probability
Unbounded above (if you predict 0 for a true 1 → \(-\log 0\))

Common gotchas¶

Clip probabilities: \(\hat p \leftarrow \mathrm{clip}(\hat p, \epsilon, 1-\epsilon)\)
Don’t confuse with accuracy: log loss cares about calibration

Example¶

For \(y=1\) and \(p=0.8\), \(-\log p=-\log 0.8\).

How to Compute (Pseudocode)¶

Input: predicted probabilities (or model likelihoods) and true labels/observations
Output: log loss

accumulate the negative log probability assigned to the observed outcomes
average over examples if reporting mean loss
return the aggregated value

Complexity¶

Time: \(O(n)\) once per-example predicted probabilities/likelihood terms are available
Space: \(O(1)\) extra space for running accumulation
Assumptions: Exact formula depends on binary vs multiclass vs sequence likelihood setup; model forward-pass cost is excluded