Law of Total Probability¶
Formula¶
\[
P(B) = \sum_i P(B\mid A_i)\,P(A_i)
\]
Parameters¶
- \(\{A_i\}\): partition of the sample space
- \(P(A_i)>0\) and \(\sum_i P(A_i)=1\)
What it means¶
Computes \(P(B)\) by conditioning on a partition of cases.
What it's used for¶
- Decomposing probabilities by cases.
- Mixture models and diagnostic calculations.
Key properties¶
- Extends to integrals for continuous partitions
- Basis for Bayes' rule normalization
Common gotchas¶
- The \(A_i\) must be mutually exclusive and exhaustive.
- Don't forget to include all cases in the partition.
Example¶
If \(P(B)=0.3\), \(P(A\mid B)=0.8\), \(P(A\mid eg B)=0.2\), then \(P(A)=0.8\cdot0.3+0.2\cdot0.7=0.38\).
How to Compute (Pseudocode)¶
Input: event probabilities / joint distribution entries
Output: requested probability quantity
identify the relevant events/variables and required joint/marginal terms
apply the probability identity in the card formula
check denominator/normalization terms are valid (nonzero when required)
return the computed probability
Complexity¶
- Time: Typically \(O(1)\) for a single event computation once required probabilities are available; larger table-based calculations scale with table size
- Space: \(O(1)\) extra space for a single computation
- Assumptions: Probability terms (joint/marginals/conditionals) are already known or computed separately