Correlation¶
Formula¶
\[
\rho_{XY} = \frac{\operatorname{Cov}(X,Y)}{\sigma_X\,\sigma_Y}
\]
Parameters¶
- \(\sigma_X=\sqrt{\operatorname{Var}(X)}\)
- \(\sigma_Y=\sqrt{\operatorname{Var}(Y)}\)
What it means¶
Normalized measure of linear dependence.
What it's used for¶
- Measuring linear association between variables.
- Feature screening and diagnostics.
Key properties¶
- \(-1 \le \rho_{XY} \le 1\)
- Invariant to affine scaling of \(X\) or \(Y\)
Common gotchas¶
- Correlation measures only linear relationships.
- Undefined if either variance is zero.
Example¶
If \(Y=X\) and \(\operatorname{Var}(X)>0\), then \( ho_{XY}=1\).
How to Compute (Pseudocode)¶
Input: sample data (and any reference values needed by the statistic)
Output: statistic value
compute the summary quantities required by the formula (for example, mean, deviations, counts)
apply the statistic formula from the card
return the result
Complexity¶
- Time: Typically \(O(n)\) for \(n\) samples for common one-pass or two-pass summary-statistic computations (sorting-based medians are \(O(n\log n)\) unless selection is used)
- Space: \(O(1)\) to \(O(n)\) depending on whether values must be stored/sorted
- Assumptions: Sample-statistic workflow shown; parameter-estimation and streaming/online algorithms can change constants and memory usage