Dice Coefficient (Sorensen-Dice)¶

Formula¶

\[ D(A,B) = \frac{2|A\cap B|}{|A|+|B|} \]

\[ D = \frac{2\mathrm{TP}}{2\mathrm{TP}+\mathrm{FP}+\mathrm{FN}} \]

Parameters¶

\(A, B\): sets (top equation)
\(\mathrm{TP}, \mathrm{FP}, \mathrm{FN}\): binary counts (bottom equation)

What it means¶

Overlap measure that doubles the intersection and normalizes by total size.

What it's used for¶

Segmentation and set prediction overlap.
Comparing binary masks when true negatives are less important.

Key properties¶

Range \([0,1]\); 1 means perfect overlap.
Symmetric: \(D(A,B)=D(B,A)\).
Relation to Jaccard: \(D=\frac{2J}{1+J}\), \(J=\frac{D}{2-D}\).
For binary classification, Dice equals \(F_1\).

Common gotchas¶

Undefined when both sets empty; choose a convention (often 1.0).
Ignores true negatives, so it can look high under strong class imbalance.

Example¶

If \(\mathrm{TP}=30, \mathrm{FP}=10, \mathrm{FN}=5\), \(D=2\cdot 30/(2\cdot 30+10+5)=0.800\).

How to Compute (Pseudocode)¶

Input: true labels and predicted labels (or sets/masks, depending on the metric)
Output: Dice score

build the contingency table / overlap counts needed by the metric
compute the metric numerator and denominator from those counts
apply any normalization/adjustment terms required by the definition
return the score

Complexity¶

Time: Typically \(O(n)\) to accumulate counts over \(n\) labeled examples once labels/sets are aligned (plus optional \(O(k^2)\) work on contingency tables for some metrics)
Space: Depends on the contingency-table size (from \(O(1)\) count accumulators to \(O(k_1 k_2)\) for label-table storage)
Assumptions: Exact complexity depends on binary-mask vs multiclass-label formulation and whether pair-count terms are computed from counts or explicit pairs