Multiple Hypothesis Testing¶
Formula¶
\[
P(\text{at least one false positive}) = 1-(1-\alpha)^m
\]
Parameters¶
- \(\alpha\): per-test error rate
- \(m\): number of tests
What it means¶
Running many tests inflates false positives unless you control a family-wise or discovery-based error criterion.
What it's used for¶
- Feature screening, genomics, online experiments, and post-hoc comparisons.
Key properties¶
- FWER controls probability of any false positive; FDR controls expected false discovery proportion.
- Correction choice depends on decision costs.
Common gotchas¶
- Uncorrected p-values across many slices are misleading.
- Correlated tests can affect exact guarantees of some procedures.
Example¶
If you test 100 unrelated features at \(\alpha=0.05\), several "significant" results may appear by chance.
How to Compute (Pseudocode)¶
Input: set of hypotheses/p-values and a target error-rate criterion
Output: adjusted decisions or error-rate summary
collect p-values from the hypothesis family
apply the chosen multiple-testing/error-rate control procedure
report adjusted decision threshold(s), rejections, or error-rate summary
return results
Complexity¶
- Time: Depends on the procedure; many standard methods are dominated by sorting (\(O(m\log m)\) for \(m\) hypotheses)
- Space: \(O(m)\) for p-values and adjusted decisions/ordering
- Assumptions: Hypotheses are treated as a specified family and the chosen procedure's assumptions determine validity