Document de travail

ICS for complex data with application to outlier detection for density data objects

Camille Mondon, Thi-Huong Trinh, Anne Ruiz-Gazen et Christine Thomas-Agnan

Résumé

ICS (Invariant coordinate selection) is a method aimed at dimension reduction as a preliminary step for clustering and outlier detection. It can be applied on multivariate or functional data. This work introduces a coordinate-free definition of ICS and extends the ICS method to distributional data. Indeed the inherent constraints of density functions imply a necessary adaptation of functional ICS. Our first achievement is a coordinate-free version of ICS within the framework of Hilbert spaces, assuming that the data lies almost surely in a finite dimensional subspace. Using the Bayes space framework tailored for density functions, we express the centred log-ratio of the density curves in a subspace of L2 0(a, b) of zero-integral spline functions and conduct ICS in this finite dimensional subspace. We describe the different steps of the procedure for outlier detection and study the impact of some parameters of this procedure on the results. The methodology is then illustrated on a sample of daily maximum temperatures densities recorded across northern Vietnamese provinces between 1987 and 2016.

Mots-clés

Bayes spaces, distributional data, functional data, invariant coordinate selection, outlier detection,Vietnam temperature densities;

Référence

Camille Mondon, Thi-Huong Trinh, Anne Ruiz-Gazen et Christine Thomas-Agnan, « ICS for complex data with application to outlier detection for density data objects », TSE Working Paper, n° 24_1585, octobre 2024.

Publié dans

TSE Working Paper, n° 24_1585, octobre 2024