Skip to content

Conditional Entropy

Formula

\[ H(X\mid Y) = -\sum_{x,y} p(x,y)\,\log p(x\mid y) \]

Parameters

  • \(X,Y\): random variables
  • \(p(x\mid y)\): conditional distribution

What it means

Remaining uncertainty about \(X\) after observing \(Y\).

What it's used for

  • Quantifying remaining uncertainty in \(X\) after observing \(Y\).
  • Feature selection and information gain calculations.

Key properties

  • \(H(X\mid Y) = H(X,Y) - H(Y)\)
  • \(H(X\mid Y) \le H(X)\)

Common gotchas

  • Conditioning reduces entropy on average, but not necessarily for each \(y\).
  • For continuous variables, use differential entropy.

Example

If \(Y=X\), then \(H(X\mid Y)=0\).

How to Compute (Pseudocode)

Input: joint probabilities p_xy[x,y], log base b
Output: conditional_entropy H(X|Y)

compute marginals p_y[y] from p_xy

total <- 0
for each pair (x, y):
  if p_xy[x,y] == 0:
    continue
  p_x_given_y <- p_xy[x,y] / p_y[y]
  total <- total - p_xy[x,y] * log_base_b(p_x_given_y)

return total

Complexity

  • Time: \(O(k_x k_y)\) for a dense discrete joint table
  • Space: \(O(k_y)\) additional space for marginals of \(Y\)
  • Assumptions: \(k_x\) and \(k_y\) are support sizes; assumes \(p_y[y] > 0\) whenever \(p_{xy}[x,y] > 0\)

See also