Publications des agents du Cirad


A criterion based on the mahalanobis distance for cluster analysis with subsampling

Picard N., Bar-Hen A.. 2012. Journal of Classification, 9 (1) : p. 23-49.

DOI: 10.1007/s00357-012-9100-9

A two-level data set consists of entities of a higher level (say populations), each one being composed of several units of the lower level (say individuals). Observations are made at the individual level, whereas population characteristics are aggregated from individual data. Cluster analysis with subsampling of populations is a cluster analysis based on individual data that aims at clustering populations rather than individuals. In this article, we extend existing optimality criteria for cluster analysis with subsampling of populations to deal with situations where population characteristics are not the mean of individual data. A new criterion that depends on the Mahalanobis distance is also defined. The criteria are compared using simulated examples and an ecological data set of tree species in a tropical rain forest.

Mots-clés : modèle mathématique; Échantillonnage; peuplement forestier; classification; espèce; guyane française; france

Documents associés

Article (a-revue à facteur d'impact)