Sample difference metrics used in ANALOG
In the following equations, n is the number of taxa that are
present in both samples to be compared. Subscripted values of s
and t denote counts or proportions of taxa in the two samples,
respectively. The result d is either a distance measure (low
values indicate like samples) or a similarity measure (high values
indicate like samples).
Manhattan distance
Squared euclidean distance
Euclidean distance
Canberra distance
Squared chord distance
Squared Chi-squared distance
Jaccard's similarity
where a is the number of taxa that are present in both samples and
c is the number of taxa that are present in one sample and absent
in the other.
Sorensen's index (similarity)
where a is the number of taxa that are present in the modern sample,
b is the number present in the fossil sample, and
c is the number of taxa that occur in both samples.
Dot product (similarity)
Correlation (similarity)
where the sums are calculated as i goes from 0 to n-1
Maintained by Peter Schweitzer