Skip to content

A/B Testing

Formula

\[ \Delta = \hat{\theta}_B - \hat{\theta}_A \]

Plot

fn: 1/(1+exp(-10*(x-0.2)))
xmin: 0
xmax: 0.5
ymin: 0
ymax: 1.05
height: 280
title: Example test power vs uplift (illustrative)

Parameters

  • \(\hat{\theta}_A,\hat{\theta}_B\): estimated metrics for control and treatment
  • \(\Delta\): estimated uplift

What it means

A/B testing compares a control and a treatment to estimate causal impact under random assignment.

What it's used for

  • Product experimentation and policy rollout decisions.
  • Measuring uplift, guardrails, and tradeoffs.

Key properties

  • Randomization supports causal interpretation if experiment execution is sound.
  • Primary metric, guardrails, and stopping rules should be defined before launch.

Common gotchas

  • Sample ratio mismatch, peeking, and multiple comparisons can invalidate inference.
  • Metrics with heavy tails may need robust analysis choices.

Example

Compare conversion rates of two landing pages with randomized traffic assignment and pre-registered metrics.

How to Compute (Pseudocode)

Input: randomized control/treatment data, primary metric, analysis plan
Output: estimated lift and inference summary

compute metric estimates for control and treatment groups
compute estimated uplift Delta = theta_hat_B - theta_hat_A
choose and run the planned inference method (for example z/t-test, bootstrap, or permutation)
report effect size, confidence interval, and/or p-value
return experiment summary

Complexity

  • Time: Typically \(O(n)\) to compute group summaries over \(n\) observations, plus the cost of the chosen inference method (resampling methods add repeated passes)
  • Space: Depends on metric aggregation and whether resampled statistics/segments are stored
  • Assumptions: Randomized assignment and pre-specified analysis plan; exact cost depends on metric complexity and inference method

See also