Skip to content

Data Science Field Guide

Mean (Expected Value)

Mean (Expected Value)¶

Formula¶

\[ \mu = \mathbb{E}[X] \]

\[ \bar x = \frac{1}{n}\sum_{i=1}^n x_i \]

Parameters¶

\(X\): random variable
\(x_i\): samples
\(n\): number of samples

What it means¶

Average value of a distribution or dataset.

What it's used for¶

Summarizing central tendency.
Baseline predictor in regression.

Key properties¶

Linear: \(\mathbb{E}[aX+b]=a\mathbb{E}[X]+b\).
Minimizes expected squared error.

Common gotchas¶

Sensitive to outliers.
For heavy-tailed distributions, the mean may not exist.

Example¶

For samples \([1, 2, 4]\), \(\bar x=(1+2+4)/3=2.333\).

How to Compute (Pseudocode)¶

Input: sample data (and any reference values needed by the statistic)
Output: statistic value

compute the summary quantities required by the formula (for example, mean, deviations, counts)
apply the statistic formula from the card
return the result

Complexity¶

Time: Typically \(O(n)\) for \(n\) samples for common one-pass or two-pass summary-statistic computations (sorting-based medians are \(O(n\log n)\) unless selection is used)
Space: \(O(1)\) to \(O(n)\) depending on whether values must be stored/sorted
Assumptions: Sample-statistic workflow shown; parameter-estimation and streaming/online algorithms can change constants and memory usage