Principal Component Analysis (PCA)¶

Formula¶

\[ \Sigma=\frac{1}{n-1}X_c^T X_c \]

\[ w_1=\arg\max_{\|w\|=1}\; w^T \Sigma w \]

\[ Z=X_c W_k \]

Parameters¶

\(X_c\): centered data matrix (rows are samples)
\(\Sigma\): sample covariance matrix
\(w_1\): first principal direction (top eigenvector of \(\Sigma\))
\(W_k\): matrix of top \(k\) principal directions
\(Z\): projected \(k\)-dimensional representation

What it means¶

Finds orthogonal directions of maximum variance and projects data onto them.

What it's used for¶

Dimensionality reduction and visualization.
Denoising and compression.
Preprocessing before clustering or regression/classification.

Key properties¶

Principal components are orthogonal
Equivalent to SVD on centered data
Explained variance comes from eigenvalues of \(\Sigma\)

Common gotchas¶

Features should usually be standardized if scales differ a lot.
PCA is linear and may miss nonlinear structure.
Center the data first (and often scale it) before applying PCA.

Example¶

Reducing 100-dimensional embeddings to 2D for visualization uses \(Z=X_cW_2\).

How to Compute (Pseudocode)¶

Input: data matrix X (n samples x d features), target components k
Output: principal components and projected data

center data: X_c <- X - column_means(X)
compute PCA via SVD or covariance eigendecomposition
select the top k principal directions W_k
project data: Z <- X_c W_k
return W_k, Z

Complexity¶

Time: Dominated by SVD/eigendecomposition (dense PCA is typically polynomial and often cubic in the smaller matrix dimension)
Space: Depends on data storage and whether full covariance or SVD factors are materialized
Assumptions: Dense PCA workflow shown; randomized/incremental PCA methods have different runtime/memory tradeoffs