Skip to content

Data Processing Inequality

Formula

\[ X \rightarrow Y \rightarrow Z \quad \Rightarrow \quad I(X;Z) \le I(X;Y) \]

Parameters

  • \(X\to Y\to Z\): Markov chain (\(Z\) depends on \(X\) only through \(Y\))
  • \(I(\cdot;\cdot)\): mutual information

What it means

Processing data cannot increase the information it contains about the source.

What it's used for

  • Showing that post-processing cannot increase information.
  • Proving bounds in pipelines \(X\to Y\to Z\).

Key properties

  • Equality holds if \(Z\) is a sufficient statistic of \(Y\) for \(X\)
  • Applies to any (possibly randomized) processing

Common gotchas

  • Requires the Markov condition; arbitrary transformations can violate it.
  • "Information" here is mutual information, not entropy.

Example

If \(Y=X\) and \(Z\) is a constant, then \(I(X;Z)=0 \le I(X;Y)=H(X)\).

How to Compute (Pseudocode)

Input: joint model/data for X -> Y -> Z and an estimation/computation method for mutual information
Output: numerical check of the DPI inequality (or symbolic proof setup)

verify/assume the Markov chain condition X -> Y -> Z
compute or estimate I(X;Y)
compute or estimate I(X;Z)
check that I(X;Z) <= I(X;Y)
return the inequality comparison

Complexity

  • Time: Depends on how mutual information is computed/estimated (discrete tables, density estimation, or model-based calculations)
  • Space: Depends on the representation of the distributions/data and the MI estimator
  • Assumptions: This is a verification workflow for the theorem; the theorem statement itself is not an algorithm and exact cost is estimator-dependent

See also