Skip to content

Correlation

Formula

\[ \rho_{XY} = \frac{\operatorname{Cov}(X,Y)}{\sigma_X\,\sigma_Y} \]

Parameters

  • \(\sigma_X=\sqrt{\operatorname{Var}(X)}\)
  • \(\sigma_Y=\sqrt{\operatorname{Var}(Y)}\)

What it means

Normalized measure of linear dependence.

What it's used for

  • Measuring linear association between variables.
  • Feature screening and diagnostics.

Key properties

  • \(-1 \le \rho_{XY} \le 1\)
  • Invariant to affine scaling of \(X\) or \(Y\)

Common gotchas

  • Correlation measures only linear relationships.
  • Undefined if either variance is zero.

Example

If \(Y=X\) and \(\operatorname{Var}(X)>0\), then \( ho_{XY}=1\).

How to Compute (Pseudocode)

Input: sample data (and any reference values needed by the statistic)
Output: statistic value

compute the summary quantities required by the formula (for example, mean, deviations, counts)
apply the statistic formula from the card
return the result

Complexity

  • Time: Typically \(O(n)\) for \(n\) samples for common one-pass or two-pass summary-statistic computations (sorting-based medians are \(O(n\log n)\) unless selection is used)
  • Space: \(O(1)\) to \(O(n)\) depending on whether values must be stored/sorted
  • Assumptions: Sample-statistic workflow shown; parameter-estimation and streaming/online algorithms can change constants and memory usage

See also