In this paper we propose a new index Z for measuring the dissimilarity between two hierarchical clusterings (or dendrograms). This index is a metric since it satisfies the axioms of non-negativity, symmetry and triangle inequality. A desirable property of this index is that it can be decomposed into the contributions pertaining to each stage of the hierarchies. We show the relations of such components with the currently used criteria for comparing two partitions. We obtain a global similarity index as the complement to one of the suggested dissimilarity and we derive its adjustment for agreement due to chance. We obtain similarity indexes pertaining to each stage of the hierarchies as the complement to one of the additive parts of the global distance Z. We consider the use of the proposed distance for more than two dendrograms and its use for the consensus of classifications and variable selection in cluster analysis. A series of simulation experiments and an application to a real data set are presented.

Dissimilarity and Similarity Measures for Comparing Dendrograms and their Applications / Morlini, Isabella; Zani, Sergio. - In: ADVANCES IN DATA ANALYSIS AND CLASSIFICATION. - ISSN 1862-5347. - 6, n. 2:(2012), pp. 85-105.

Dissimilarity and Similarity Measures for Comparing Dendrograms and their Applications

MORLINI, Isabella;ZANI, Sergio
2012-01-01

Abstract

In this paper we propose a new index Z for measuring the dissimilarity between two hierarchical clusterings (or dendrograms). This index is a metric since it satisfies the axioms of non-negativity, symmetry and triangle inequality. A desirable property of this index is that it can be decomposed into the contributions pertaining to each stage of the hierarchies. We show the relations of such components with the currently used criteria for comparing two partitions. We obtain a global similarity index as the complement to one of the suggested dissimilarity and we derive its adjustment for agreement due to chance. We obtain similarity indexes pertaining to each stage of the hierarchies as the complement to one of the additive parts of the global distance Z. We consider the use of the proposed distance for more than two dendrograms and its use for the consensus of classifications and variable selection in cluster analysis. A series of simulation experiments and an application to a real data set are presented.
2012
Dissimilarity and Similarity Measures for Comparing Dendrograms and their Applications / Morlini, Isabella; Zani, Sergio. - In: ADVANCES IN DATA ANALYSIS AND CLASSIFICATION. - ISSN 1862-5347. - 6, n. 2:(2012), pp. 85-105.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11381/2434860
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact