Skip to content

Conditional Expectation

Formula

\[ \mathbb{E}[X\mid Y] = \sum_x x\,p(x\mid Y) \quad\text{or}\quad \mathbb{E}[X\mid Y] = \int x\,p(x\mid Y)\,dx \]

Parameters

  • \(X\): random variable
  • \(Y\): conditioning variable
  • \(p(x\mid Y)\): conditional distribution

What it means

Best mean-squared predictor of \(X\) given \(Y\).

What it's used for

  • Best mean-squared predictor given information.
  • Computing expected values given events.

Key properties

  • Law of total expectation: \(\mathbb{E}[\mathbb{E}[X\mid Y]] = \mathbb{E}[X]\)
  • \(\mathbb{E}[g(Y)X\mid Y]=g(Y)\,\mathbb{E}[X\mid Y]\)

Common gotchas

  • \(\mathbb{E}[X\mid Y]\) is a random variable, not a constant.
  • Conditioning can reduce variance but not always pointwise.

Example

If \(E[X\mid Y=1]=3\) and \(E[X\mid Y=0]=1\), then \(E[X\mid Y]\) equals 3 when \(Y=1\) and 1 when \(Y=0\).

How to Compute (Pseudocode)

Input: distribution/model (or sample-based estimate) and target function/value definition
Output: expectation quantity

if a discrete distribution is available:
  compute a weighted sum over support values
if a continuous density is available:
  compute an integral (analytically or numerically)
if estimating from samples:
  compute the sample average of the target quantity
return the expectation (or estimate)

Complexity

  • Time: Depends on representation (support size, numerical quadrature, or sample count); sample averages are typically \(O(n)\)
  • Space: \(O(1)\) extra accumulation space for streaming/sample-average computations (more for grids/tables)
  • Assumptions: Exact analytic expectations and numerical/sample estimates use different workflows and error tradeoffs

See also