Jaccard Similarity (Intersection over Union)¶
Formula¶
\[
J(A,B) = \frac{|A\cap B|}{|A\cup B|}
\]
\[
J = \frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}+\mathrm{FN}}
\]
Parameters¶
- \(A, B\): sets (top equation)
- \(\mathrm{TP}, \mathrm{FP}, \mathrm{FN}\): binary counts (bottom equation)
What it means¶
Overlap ratio between predicted and true items.
What it's used for¶
- Evaluating overlap in segmentation or set prediction tasks.
- Comparing binary masks (IoU).
Key properties¶
- Relation to Dice / F1 (binary) shown below.
\[
D = \frac{2\mathrm{TP}}{2\mathrm{TP}+\mathrm{FP}+\mathrm{FN}}
\]
\[
D = \frac{2J}{1+J},\quad J=\frac{D}{2-D}
\]
Common gotchas¶
- Undefined when both sets empty; choose a convention (often 1.0).
Example¶
If \(\mathrm{TP}=30, \mathrm{FP}=10, \mathrm{FN}=5\), \(J=30/(30+10+5)=0.667\).
How to Compute (Pseudocode)¶
Input: true labels and predicted labels (or sets/masks, depending on the metric)
Output: Jaccard score
build the contingency table / overlap counts needed by the metric
compute the metric numerator and denominator from those counts
apply any normalization/adjustment terms required by the definition
return the score
Complexity¶
- Time: Typically \(O(n)\) to accumulate counts over \(n\) labeled examples once labels/sets are aligned (plus optional \(O(k^2)\) work on contingency tables for some metrics)
- Space: Depends on the contingency-table size (from \(O(1)\) count accumulators to \(O(k_1 k_2)\) for label-table storage)
- Assumptions: Exact complexity depends on binary-mask vs multiclass-label formulation and whether pair-count terms are computed from counts or explicit pairs