Sample difference metrics used in ANALOG

In the following equations, n is the number of taxa that are present in both samples to be compared. Subscripted values of s and t denote counts or proportions of taxa in the two samples, respectively. The result d is either a distance measure (low values indicate like samples) or a similarity measure (high values indicate like samples).

Manhattan distance


Squared euclidean distance


Euclidean distance


Canberra distance


Squared chord distance


Squared Chi-squared distance


Jaccard's similarity

where a is the number of taxa that are present in both samples and c is the number of taxa that are present in one sample and absent in the other.


Sorensen's index (similarity)

where a is the number of taxa that are present in the modern sample, b is the number present in the fossil sample, and c is the number of taxa that occur in both samples.


Dot product (similarity)


Correlation (similarity)

where the sums are calculated as i goes from 0 to n-1


Maintained by Peter Schweitzer

Accessibility FOIA Privacy Policies and Notices

Take Pride in America home page. FirstGov button U.S. Department of the Interior | U.S. Geological Survey
URL: https://pubsdata.usgs.gov/pubs/of/1994/of94-645/distance.html
Page Contact Information: Publication Services Group
Page Last Modified: 19:30:59 Wed 07 Dec 2016