Principal Component Analysis (PCA)¶
Formula¶
\[
\Sigma=\frac{1}{n-1}X_c^T X_c
\]
\[
w_1=\arg\max_{\|w\|=1}\; w^T \Sigma w
\]
\[
Z=X_c W_k
\]
Parameters¶
- \(X_c\): centered data matrix (rows are samples)
- \(\Sigma\): sample covariance matrix
- \(w_1\): first principal direction (top eigenvector of \(\Sigma\))
- \(W_k\): matrix of top \(k\) principal directions
- \(Z\): projected \(k\)-dimensional representation
What it means¶
Finds orthogonal directions of maximum variance and projects data onto them.
What it's used for¶
- Dimensionality reduction and visualization.
- Denoising and compression.
- Preprocessing before clustering or regression/classification.
Key properties¶
- Principal components are orthogonal
- Equivalent to SVD on centered data
- Explained variance comes from eigenvalues of \(\Sigma\)
Common gotchas¶
- Features should usually be standardized if scales differ a lot.
- PCA is linear and may miss nonlinear structure.
- Center the data first (and often scale it) before applying PCA.
Example¶
Reducing 100-dimensional embeddings to 2D for visualization uses \(Z=X_cW_2\).
How to Compute (Pseudocode)¶
Input: data matrix X (n samples x d features), target components k
Output: principal components and projected data
center data: X_c <- X - column_means(X)
compute PCA via SVD or covariance eigendecomposition
select the top k principal directions W_k
project data: Z <- X_c W_k
return W_k, Z
Complexity¶
- Time: Dominated by SVD/eigendecomposition (dense PCA is typically polynomial and often cubic in the smaller matrix dimension)
- Space: Depends on data storage and whether full covariance or SVD factors are materialized
- Assumptions: Dense PCA workflow shown; randomized/incremental PCA methods have different runtime/memory tradeoffs