Jensen-Shannon Divergence¶

Formula¶

\[ \operatorname{JSD}(P\|Q) = \frac{1}{2} D_{\mathrm{KL}}(P\|M) + \frac{1}{2} D_{\mathrm{KL}}(Q\|M),\quad M=\frac{1}{2}(P+Q) \]

Plot¶

fn: (-(0.5*x+0.25)*log(0.5*x+0.25)-(0.75-0.5*x)*log(0.75-0.5*x) + 0.5*(x*log(x)+(1-x)*log(1-x)) + 0.5*(0.5*log(0.5)+0.5*log(0.5)))/log(2)
xmin: 0.001
xmax: 0.999
ymin: 0
ymax: 0.35
height: 280
title: Bernoulli JSD(P || Bern(0.5)) (bits)

Parameters¶

\(P,Q\): distributions
\(M\): mixture distribution

What it means¶

A symmetric, smoothed divergence between two distributions.

What it's used for¶

Symmetric distance-like measure between distributions.
Comparing topic models or embedding distributions.

Key properties¶

Symmetric and finite for discrete distributions
\(0 \le \operatorname{JSD}(P\|Q) \le \log 2\) (base 2)

Common gotchas¶

JSD is not a true metric unless you take \(\sqrt{\operatorname{JSD}}\).
Bounds depend on log base.

Example¶

If \(P=Q\), then \(\mathrm{JS}(P,Q)=0\).

How to Compute (Pseudocode)¶

Input: discrete probabilities p[1..k], q[1..k], log base b
Output: js_divergence

for i from 1 to k:
  m[i] <- 0.5 * (p[i] + q[i])

js_divergence <- 0.5 * KL(p || m) + 0.5 * KL(q || m)
return js_divergence

Complexity¶

Time: \(O(k)\) assuming linear-time KL computations on discrete arrays
Space: \(O(k)\) if materializing the mixture distribution \(m\) (or \(O(1)\) extra if streamed)
Assumptions: \(k\) is the number of outcomes in the shared discrete support; KL subroutine handles zero-probability terms safely