Fowlkes-Mallows Index¶
Formula¶
\[
\mathrm{FM}=\frac{\mathrm{TP}}{\sqrt{(\mathrm{TP}+\mathrm{FP})(\mathrm{TP}+\mathrm{FN})}}
=\sqrt{\mathrm{Precision}\cdot \mathrm{Recall}}
\]
Parameters¶
- \(\mathrm{TP},\mathrm{FP},\mathrm{FN}\): pair-counts over point pairs for two partitions
- Precision/Recall: computed on pairwise same-cluster decisions
What it means¶
Geometric mean of pairwise precision and recall for clustering agreement.
What it's used for¶
- External clustering validation against reference labels.
- Comparing clustering outputs using pairwise membership decisions.
Key properties¶
- Range \([0,1]\), higher is better
- Invariant to cluster label permutations
Common gotchas¶
- Pairwise counts scale as \(O(n^2)\) conceptually (implementations optimize this).
- Not adjusted for chance (unlike ARI).
Example¶
If pairwise precision is \(0.81\) and recall is \(0.64\), then \(\mathrm{FM}=\sqrt{0.81\cdot0.64}=0.72\).
How to Compute (Pseudocode)¶
Input: true labels and predicted labels (or sets/masks, depending on the metric)
Output: Fowlkes-Mallows score
build the contingency table / overlap counts needed by the metric
compute the metric numerator and denominator from those counts
apply any normalization/adjustment terms required by the definition
return the score
Complexity¶
- Time: Typically \(O(n)\) to accumulate counts over \(n\) labeled examples once labels/sets are aligned (plus optional \(O(k^2)\) work on contingency tables for some metrics)
- Space: Depends on the contingency-table size (from \(O(1)\) count accumulators to \(O(k_1 k_2)\) for label-table storage)
- Assumptions: Exact complexity depends on binary-mask vs multiclass-label formulation and whether pair-count terms are computed from counts or explicit pairs