Spectral Clustering¶
Formula¶
\[
L_{\mathrm{sym}} = I - D^{-1/2} A D^{-1/2}
\]
Parameters¶
- \(A\): adjacency matrix
- \(D\): degree matrix
- \(L_{\mathrm{sym}}\): normalized Laplacian
What it means¶
Clusters nodes using eigenvectors of the graph Laplacian.
What it's used for¶
- Clustering graph nodes using Laplacian eigenvectors.
- Segmenting data when a similarity graph is available.
Key properties¶
- Use \(k\) smallest eigenvectors, then run k-means
- Relates to normalized cut objectives
Common gotchas¶
- Choice of similarity graph and \(k\) strongly affects results.
- Eigenvectors must be normalized before k-means.
Example¶
If the graph is two disconnected pairs, the 2 smallest eigenvectors separate the pairs into two clusters.
How to Compute (Pseudocode)¶
Input: similarity graph (or similarity matrix), number of clusters k
Output: cluster assignments
build adjacency/similarity matrix A and degree matrix D
construct a graph Laplacian (for example, normalized Laplacian)
compute the k eigenvectors associated with the smallest relevant eigenvalues
form an embedding from those eigenvectors (often row-normalized)
run k-means on the embedding rows
return k-means cluster labels
Complexity¶
- Time: Dominated by eigendecomposition/eigensolver cost plus k-means on the spectral embedding (dense full eigendecomposition can be \(O(n^3)\); sparse/partial methods can be much cheaper)
- Space: Often dominated by graph/similarity storage and eigenvector embeddings (dense \(O(n^2)\), sparse representations lower)
- Assumptions: \(n\) nodes/points; complexity depends strongly on how the similarity graph is built and whether dense or sparse spectral solvers are used