Silhouette Score¶
Formula¶
\[
s(i)=\frac{b(i)-a(i)}{\max\{a(i),\,b(i)\}}
\]
\[
\mathrm{Silhouette}=\frac{1}{n}\sum_{i=1}^n s(i)
\]
Parameters¶
- \(a(i)\): mean distance from point \(i\) to points in its own cluster
- \(b(i)\): smallest mean distance from point \(i\) to another cluster (nearest cluster)
- \(n\): number of points
What it means¶
Measures how well each point fits its assigned cluster versus neighboring clusters.
What it's used for¶
- Comparing clustering quality across different \(k\).
- Detecting overlapping or poorly separated clusters.
Key properties¶
- Range \([-1,1]\)
- Higher is better
- Works with any distance metric (not just Euclidean)
Common gotchas¶
- Expensive for large datasets if pairwise distances are computed naively.
- Favors convex/spherical clusters more than irregular density-based shapes.
Example¶
If a point has \(a(i)=0.4\) and \(b(i)=1.0\), then \(s(i)=(1.0-0.4)/1.0=0.6\).
How to Compute (Pseudocode)¶
Input: data points, cluster labels (and metric-specific settings)
Output: silhouette score
compute the per-cluster and/or per-point quantities required by the metric
aggregate them according to the metric definition
return the metric value
Complexity¶
- Time: Depends on the metric and implementation; many clustering validation metrics require at least \(O(n)\) scans, and some need pairwise distances (up to \(O(n^2)\))
- Space: Depends on whether pairwise distances are materialized (from \(O(1)\)/\(O(k)\) summaries up to \(O(n^2)\))
- Assumptions: \(n\) points and cluster labels are given; exact cost depends on distance computation/caching and metric-specific formulas