Document de travail

Detecting outliers in compositional data using Invariant Coordinate Selection

Anne Ruiz-Gazen, Christine Thomas-Agnan, Thibault Laurent et Camille Mondon

Résumé

Invariant Coordinate Selection (ICS) is a multivariate statistical method introduced by Tyler et al. (2009) and based on the simultaneous diagonalization of two scatter matrices. A model based approach of ICS, called Invariant Coordinate Analysis, has already been adapted for compositional data in Muehlmann et al.(2021). In a model free context, ICS is also helpful at identifying outliers (Nordhausen and Ruiz-Gazen, 2022). We propose to develop a version of ICS for outlier detection in compositional data. This version is first introduced in coordinate space for a specific choice of ilr coordinate system associated to a contrast matrix and follows the outlier detection procedure proposed by Archimbaud et al. (2018a). We then show that the procedure is independent of the choice of contrast matrix and can be defined directly in the simplex. To do so, we first establish some properties of the set of matrices satisfying the zero-sum property and introduce a simplex definition of the Mahalanobis distance and the one-step M-estimators class of scatter matrices. We also need to define the family of elliptical distributions in the simplex. We then show how to interpret the results directly in the simplex using two artificial datasets and a real dataset of market shares in the automobile industry.

Référence

Anne Ruiz-Gazen, Christine Thomas-Agnan, Thibault Laurent et Camille Mondon, « Detecting outliers in compositional data using Invariant Coordinate Selection », TSE Working Paper, n° 22-1320, mars 2022.

Voir aussi

Publié dans

TSE Working Paper, n° 22-1320, mars 2022