Bootstrap¶

Formula¶

\[ \hat{\theta}^{*(b)} = s(X^{*(b)}),\quad X^{*(b)}\sim \text{resample with replacement from } X \]

Parameters¶

\(s(\cdot)\): statistic
\(b\): bootstrap replicate index

What it means¶

The bootstrap estimates sampling variability by repeatedly resampling the observed data with replacement.

What it's used for¶

Confidence intervals, standard errors, and stability checks when analytic formulas are hard.
Model performance uncertainty estimates.

Key properties¶

Nonparametric and broadly applicable.
Works best when the sample represents the population and observations are appropriately independent.

Common gotchas¶

Resample the right unit (e.g., user/session/cluster) to match dependence structure.
Naive bootstrap can fail for heavy dependence/time series.

Example¶

Estimate a 95% CI for median revenue by bootstrapping users 10,000 times.

How to Compute (Pseudocode)¶

Input: dataset, statistic s(.), number of bootstrap resamples B
Output: bootstrap replicates and uncertainty summary

for b from 1 to B:
  sample a bootstrap dataset by resampling with replacement
  compute theta_star[b] <- s(resampled_data)
aggregate theta_star values (SE, CI, quantiles, etc.)
return bootstrap summary

Complexity¶

Time: \(O(B \cdot \mathrm{StatCost})\), where \(\mathrm{StatCost}\) is the cost to compute the statistic on one resample
Space: \(O(B)\) to store bootstrap replicates (or \(O(1)\) extra if streaming a summary only) plus resample/statistic workspace
Assumptions: Resampling unit and dependence structure must match the study design; \(B\) controls Monte Carlo error and runtime