Skip to content

Bootstrap

Formula

\[ \hat{\theta}^{*(b)} = s(X^{*(b)}),\quad X^{*(b)}\sim \text{resample with replacement from } X \]

Parameters

  • \(s(\cdot)\): statistic
  • \(b\): bootstrap replicate index

What it means

The bootstrap estimates sampling variability by repeatedly resampling the observed data with replacement.

What it's used for

  • Confidence intervals, standard errors, and stability checks when analytic formulas are hard.
  • Model performance uncertainty estimates.

Key properties

  • Nonparametric and broadly applicable.
  • Works best when the sample represents the population and observations are appropriately independent.

Common gotchas

  • Resample the right unit (e.g., user/session/cluster) to match dependence structure.
  • Naive bootstrap can fail for heavy dependence/time series.

Example

Estimate a 95% CI for median revenue by bootstrapping users 10,000 times.

How to Compute (Pseudocode)

Input: dataset, statistic s(.), number of bootstrap resamples B
Output: bootstrap replicates and uncertainty summary

for b from 1 to B:
  sample a bootstrap dataset by resampling with replacement
  compute theta_star[b] <- s(resampled_data)
aggregate theta_star values (SE, CI, quantiles, etc.)
return bootstrap summary

Complexity

  • Time: \(O(B \cdot \mathrm{StatCost})\), where \(\mathrm{StatCost}\) is the cost to compute the statistic on one resample
  • Space: \(O(B)\) to store bootstrap replicates (or \(O(1)\) extra if streaming a summary only) plus resample/statistic workspace
  • Assumptions: Resampling unit and dependence structure must match the study design; \(B\) controls Monte Carlo error and runtime

See also