Skip to content

Spectral Clustering

Formula

\[ L_{\mathrm{sym}} = I - D^{-1/2} A D^{-1/2} \]

Parameters

  • \(A\): adjacency matrix
  • \(D\): degree matrix
  • \(L_{\mathrm{sym}}\): normalized Laplacian

What it means

Clusters nodes using eigenvectors of the graph Laplacian.

What it's used for

  • Clustering graph nodes using Laplacian eigenvectors.
  • Segmenting data when a similarity graph is available.

Key properties

  • Use \(k\) smallest eigenvectors, then run k-means
  • Relates to normalized cut objectives

Common gotchas

  • Choice of similarity graph and \(k\) strongly affects results.
  • Eigenvectors must be normalized before k-means.

Example

If the graph is two disconnected pairs, the 2 smallest eigenvectors separate the pairs into two clusters.

How to Compute (Pseudocode)

Input: similarity graph (or similarity matrix), number of clusters k
Output: cluster assignments

build adjacency/similarity matrix A and degree matrix D
construct a graph Laplacian (for example, normalized Laplacian)
compute the k eigenvectors associated with the smallest relevant eigenvalues
form an embedding from those eigenvectors (often row-normalized)
run k-means on the embedding rows
return k-means cluster labels

Complexity

  • Time: Dominated by eigendecomposition/eigensolver cost plus k-means on the spectral embedding (dense full eigendecomposition can be \(O(n^3)\); sparse/partial methods can be much cheaper)
  • Space: Often dominated by graph/similarity storage and eigenvector embeddings (dense \(O(n^2)\), sparse representations lower)
  • Assumptions: \(n\) nodes/points; complexity depends strongly on how the similarity graph is built and whether dense or sparse spectral solvers are used

See also