Dice Coefficient (Sorensen-Dice)¶
Formula¶
\[
D(A,B) = \frac{2|A\cap B|}{|A|+|B|}
\]
\[
D = \frac{2\mathrm{TP}}{2\mathrm{TP}+\mathrm{FP}+\mathrm{FN}}
\]
Parameters¶
- \(A, B\): sets (top equation)
- \(\mathrm{TP}, \mathrm{FP}, \mathrm{FN}\): binary counts (bottom equation)
What it means¶
Overlap measure that doubles the intersection and normalizes by total size.
What it's used for¶
- Segmentation and set prediction overlap.
- Comparing binary masks when true negatives are less important.
Key properties¶
- Range \([0,1]\); 1 means perfect overlap.
- Symmetric: \(D(A,B)=D(B,A)\).
- Relation to Jaccard: \(D=\frac{2J}{1+J}\), \(J=\frac{D}{2-D}\).
- For binary classification, Dice equals \(F_1\).
Common gotchas¶
- Undefined when both sets empty; choose a convention (often 1.0).
- Ignores true negatives, so it can look high under strong class imbalance.
Example¶
If \(\mathrm{TP}=30, \mathrm{FP}=10, \mathrm{FN}=5\), \(D=2\cdot 30/(2\cdot 30+10+5)=0.800\).
How to Compute (Pseudocode)¶
Input: true labels and predicted labels (or sets/masks, depending on the metric)
Output: Dice score
build the contingency table / overlap counts needed by the metric
compute the metric numerator and denominator from those counts
apply any normalization/adjustment terms required by the definition
return the score
Complexity¶
- Time: Typically \(O(n)\) to accumulate counts over \(n\) labeled examples once labels/sets are aligned (plus optional \(O(k^2)\) work on contingency tables for some metrics)
- Space: Depends on the contingency-table size (from \(O(1)\) count accumulators to \(O(k_1 k_2)\) for label-table storage)
- Assumptions: Exact complexity depends on binary-mask vs multiclass-label formulation and whether pair-count terms are computed from counts or explicit pairs