Skip to content

Data Science Field Guide

Shannon Entropy

Shannon Entropy¶

Formula¶

\[ H(X) = -\sum_{x} p(x)\,\log p(x) \]

Plot¶

fn: -(x*log(x)+(1-x)*log(1-x))/log(2)
xmin: 0.001
xmax: 0.999
ymin: 0
ymax: 1.05
height: 280
title: Binary entropy H(p)

Parameters¶

\(p(x)\): probability of outcome \(x\)
log base 2 → bits; base \(e\) → nats

What it means¶

Expected uncertainty (“average surprise”) of outcomes of \(X\).

What it's used for¶

Measuring uncertainty and minimum coding length.
Comparing randomness of distributions.

Key properties¶

\(0 \le H(X) \le \log |\mathcal{X}|\) for finite \(\mathcal{X}\)
Maximized by uniform distribution
\(H(X)=0\) if \(X\) is deterministic

Common gotchas¶

Don’t mix log bases when comparing numeric values.
Continuous version is differential entropy (behaves differently).

Example¶

For a fair coin \(p=[0.5,0.5]\), \(H=1\) bit (log base 2).

How to Compute (Pseudocode)¶

Input: discrete probabilities p[1..k], log base b
Output: entropy

total <- 0
for i from 1 to k:
  if p[i] == 0:
    continue  // treat 0 * log(0) as 0
  total <- total - p[i] * log_base_b(p[i])

return total

Complexity¶

Time: \(O(k)\) for the discrete sum
Space: \(O(1)\) additional space
Assumptions: \(k\) is the number of possible outcomes in the discrete support