Skip to content

Jaccard Similarity (Intersection over Union)

Formula

\[ J(A,B) = \frac{|A\cap B|}{|A\cup B|} \]
\[ J = \frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}+\mathrm{FN}} \]

Parameters

  • \(A, B\): sets (top equation)
  • \(\mathrm{TP}, \mathrm{FP}, \mathrm{FN}\): binary counts (bottom equation)

What it means

Overlap ratio between predicted and true items.

What it's used for

  • Evaluating overlap in segmentation or set prediction tasks.
  • Comparing binary masks (IoU).

Key properties

  • Relation to Dice / F1 (binary) shown below.
\[ D = \frac{2\mathrm{TP}}{2\mathrm{TP}+\mathrm{FP}+\mathrm{FN}} \]
\[ D = \frac{2J}{1+J},\quad J=\frac{D}{2-D} \]

Common gotchas

  • Undefined when both sets empty; choose a convention (often 1.0).

Example

If \(\mathrm{TP}=30, \mathrm{FP}=10, \mathrm{FN}=5\), \(J=30/(30+10+5)=0.667\).

How to Compute (Pseudocode)

Input: true labels and predicted labels (or sets/masks, depending on the metric)
Output: Jaccard score

build the contingency table / overlap counts needed by the metric
compute the metric numerator and denominator from those counts
apply any normalization/adjustment terms required by the definition
return the score

Complexity

  • Time: Typically \(O(n)\) to accumulate counts over \(n\) labeled examples once labels/sets are aligned (plus optional \(O(k^2)\) work on contingency tables for some metrics)
  • Space: Depends on the contingency-table size (from \(O(1)\) count accumulators to \(O(k_1 k_2)\) for label-table storage)
  • Assumptions: Exact complexity depends on binary-mask vs multiclass-label formulation and whether pair-count terms are computed from counts or explicit pairs