A/B Testing¶
Formula¶
\[
\Delta = \hat{\theta}_B - \hat{\theta}_A
\]
Plot¶
fn: 1/(1+exp(-10*(x-0.2)))
xmin: 0
xmax: 0.5
ymin: 0
ymax: 1.05
height: 280
title: Example test power vs uplift (illustrative)
Parameters¶
- \(\hat{\theta}_A,\hat{\theta}_B\): estimated metrics for control and treatment
- \(\Delta\): estimated uplift
What it means¶
A/B testing compares a control and a treatment to estimate causal impact under random assignment.
What it's used for¶
- Product experimentation and policy rollout decisions.
- Measuring uplift, guardrails, and tradeoffs.
Key properties¶
- Randomization supports causal interpretation if experiment execution is sound.
- Primary metric, guardrails, and stopping rules should be defined before launch.
Common gotchas¶
- Sample ratio mismatch, peeking, and multiple comparisons can invalidate inference.
- Metrics with heavy tails may need robust analysis choices.
Example¶
Compare conversion rates of two landing pages with randomized traffic assignment and pre-registered metrics.
How to Compute (Pseudocode)¶
Input: randomized control/treatment data, primary metric, analysis plan
Output: estimated lift and inference summary
compute metric estimates for control and treatment groups
compute estimated uplift Delta = theta_hat_B - theta_hat_A
choose and run the planned inference method (for example z/t-test, bootstrap, or permutation)
report effect size, confidence interval, and/or p-value
return experiment summary
Complexity¶
- Time: Typically \(O(n)\) to compute group summaries over \(n\) observations, plus the cost of the chosen inference method (resampling methods add repeated passes)
- Space: Depends on metric aggregation and whether resampled statistics/segments are stored
- Assumptions: Randomized assignment and pre-specified analysis plan; exact cost depends on metric complexity and inference method