Skip to content

Shannon Entropy

Formula

\[ H(X) = -\sum_{x} p(x)\,\log p(x) \]

Plot

fn: -(x*log(x)+(1-x)*log(1-x))/log(2)
xmin: 0.001
xmax: 0.999
ymin: 0
ymax: 1.05
height: 280
title: Binary entropy H(p)

Parameters

  • \(p(x)\): probability of outcome \(x\)
  • log base 2 → bits; base \(e\) → nats

What it means

Expected uncertainty (“average surprise”) of outcomes of \(X\).

What it's used for

  • Measuring uncertainty and minimum coding length.
  • Comparing randomness of distributions.

Key properties

  • \(0 \le H(X) \le \log |\mathcal{X}|\) for finite \(\mathcal{X}\)
  • Maximized by uniform distribution
  • \(H(X)=0\) if \(X\) is deterministic

Common gotchas

  • Don’t mix log bases when comparing numeric values.
  • Continuous version is differential entropy (behaves differently).

Example

For a fair coin \(p=[0.5,0.5]\), \(H=1\) bit (log base 2).

How to Compute (Pseudocode)

Input: discrete probabilities p[1..k], log base b
Output: entropy

total <- 0
for i from 1 to k:
  if p[i] == 0:
    continue  // treat 0 * log(0) as 0
  total <- total - p[i] * log_base_b(p[i])

return total

Complexity

  • Time: \(O(k)\) for the discrete sum
  • Space: \(O(1)\) additional space
  • Assumptions: \(k\) is the number of possible outcomes in the discrete support