Skip to content

Data Science Field Guide

Two-Sample t-Test (Welch)

Two-Sample t-Test (Welch)¶

Formula¶

\[ t = \frac{\bar x_1 - \bar x_2}{\sqrt{s_1^2/n_1 + s_2^2/n_2}} \]

Plot¶

fn: 1/(1+exp(-5*(x-0.5)))
xmin: 0
xmax: 1.5
ymin: 0
ymax: 1.05
height: 280
title: Example power curve for two-sample t-test (illustrative)

Parameters¶

\(\bar x_1, \bar x_2\): sample means
\(s_1, s_2\): sample standard deviations
\(n_1, n_2\): sample sizes

What it means¶

Tests whether two population means differ.

What it's used for¶

Comparing means from two independent groups.
Works with unequal variances (Welch's t-test).

Key properties¶

Degrees of freedom use the Welch-Satterthwaite approximation.
Does not assume equal variances.

Common gotchas¶

Assumes independent samples and roughly normal data in each group.
If samples are paired, use a paired t-test instead.

Example¶

If \(\bar x_1=5\), \(\bar x_2=3\), \(s_1=2\), \(s_2=1\), \(n_1=n_2=10\), then \(t=(5-3)/\sqrt{4/10+1/10}=2.828\).

How to Compute (Pseudocode)¶

Input: data, null hypothesis H0, test statistic T
Output: test statistic and p-value decision summary

compute the observed test statistic T_obs from the data
obtain the null distribution (analytic approximation or exact table, depending on the test)
compute the p-value from the null distribution and tail convention
compare p-value to alpha (if making a decision)
return T_obs and p-value

Complexity¶

Time: Depends on the specific test (summary-statistic computation is often linear in sample size; p-value computation may be constant-time with a CDF call or more expensive if resampling is used)
Space: Depends on whether intermediate summaries or resampled/null distributions are materialized
Assumptions: Test-specific assumptions (independence, variance structure, distributional assumptions) determine validity and exact computation details