Conditional Expectation¶

Formula¶

\[ \mathbb{E}[X\mid Y] = \sum_x x\,p(x\mid Y) \quad\text{or}\quad \mathbb{E}[X\mid Y] = \int x\,p(x\mid Y)\,dx \]

Parameters¶

\(X\): random variable
\(Y\): conditioning variable
\(p(x\mid Y)\): conditional distribution

What it means¶

Best mean-squared predictor of \(X\) given \(Y\).

What it's used for¶

Best mean-squared predictor given information.
Computing expected values given events.

Key properties¶

Law of total expectation: \(\mathbb{E}[\mathbb{E}[X\mid Y]] = \mathbb{E}[X]\)
\(\mathbb{E}[g(Y)X\mid Y]=g(Y)\,\mathbb{E}[X\mid Y]\)

Common gotchas¶

\(\mathbb{E}[X\mid Y]\) is a random variable, not a constant.
Conditioning can reduce variance but not always pointwise.

Example¶

If \(E[X\mid Y=1]=3\) and \(E[X\mid Y=0]=1\), then \(E[X\mid Y]\) equals 3 when \(Y=1\) and 1 when \(Y=0\).

How to Compute (Pseudocode)¶

Input: distribution/model (or sample-based estimate) and target function/value definition
Output: expectation quantity

if a discrete distribution is available:
  compute a weighted sum over support values
if a continuous density is available:
  compute an integral (analytically or numerically)
if estimating from samples:
  compute the sample average of the target quantity
return the expectation (or estimate)

Complexity¶

Time: Depends on representation (support size, numerical quadrature, or sample count); sample averages are typically \(O(n)\)
Space: \(O(1)\) extra accumulation space for streaming/sample-average computations (more for grids/tables)
Assumptions: Exact analytic expectations and numerical/sample estimates use different workflows and error tradeoffs