Skip to content

Jensen-Shannon Divergence

Formula

\[ \operatorname{JSD}(P\|Q) = \frac{1}{2} D_{\mathrm{KL}}(P\|M) + \frac{1}{2} D_{\mathrm{KL}}(Q\|M),\quad M=\frac{1}{2}(P+Q) \]

Plot

fn: (-(0.5*x+0.25)*log(0.5*x+0.25)-(0.75-0.5*x)*log(0.75-0.5*x) + 0.5*(x*log(x)+(1-x)*log(1-x)) + 0.5*(0.5*log(0.5)+0.5*log(0.5)))/log(2)
xmin: 0.001
xmax: 0.999
ymin: 0
ymax: 0.35
height: 280
title: Bernoulli JSD(P || Bern(0.5)) (bits)

Parameters

  • \(P,Q\): distributions
  • \(M\): mixture distribution

What it means

A symmetric, smoothed divergence between two distributions.

What it's used for

  • Symmetric distance-like measure between distributions.
  • Comparing topic models or embedding distributions.

Key properties

  • Symmetric and finite for discrete distributions
  • \(0 \le \operatorname{JSD}(P\|Q) \le \log 2\) (base 2)

Common gotchas

  • JSD is not a true metric unless you take \(\sqrt{\operatorname{JSD}}\).
  • Bounds depend on log base.

Example

If \(P=Q\), then \(\mathrm{JS}(P,Q)=0\).

How to Compute (Pseudocode)

Input: discrete probabilities p[1..k], q[1..k], log base b
Output: js_divergence

for i from 1 to k:
  m[i] <- 0.5 * (p[i] + q[i])

js_divergence <- 0.5 * KL(p || m) + 0.5 * KL(q || m)
return js_divergence

Complexity

  • Time: \(O(k)\) assuming linear-time KL computations on discrete arrays
  • Space: \(O(k)\) if materializing the mixture distribution \(m\) (or \(O(1)\) extra if streamed)
  • Assumptions: \(k\) is the number of outcomes in the shared discrete support; KL subroutine handles zero-probability terms safely

See also