[Tests d’homogénéité et de détection de ruptures pour des données multivariées en utilisant des statistiques de rang]
La détection et la localisation de changements dans des données de très grande dimension est un problème majeur dans plusieurs domaines d’applications. Dans ce contexte, la première contribution de notre papier est un nouveau test d’homogénéité non-paramétrique à deux échantillons pour des données multivariées fondé sur la statistique de rang de Wilcoxon. Le test d’homogénéité à deux échantillons que nous proposons peut être étendu au cas de données censurées et pour proposer un test d’homogénéité pour plus de deux échantillons. Nous proposons également une analyse détaillée du calcul de la puissance de notre statistique de test vis à vis de certaines alternatives locales. La seconde contribution de notre papier concerne l’utilisation de notre statistique de test pour faire de la détection rétrospective de ruptures. Nous montrons que notre méthode peut-être implémentée de façon efficace d’un point de vue algorithmique grâce à la programmation dynamique et nous proposons une méthode pour calculer les -valeurs. Nous recommandons particulièrement notre approche dans les situations suivantes : lorsque les corrélations entre les coordonnées des observations sont modérées, lorsque les lois marginales ne peuvent pas être modélisées par les lois paramétriques usuelles ou lorsque les changements n’affectent qu’une partie des coordonnées des observations.
Detecting and locating changes in highly multivariate data is a major concern in several current statistical applications. In this context, the first contribution of the paper is a novel non-parametric two-sample homogeneity test for multivariate data based on the well-known Wilcoxon rank statistic. The proposed two-sample homogeneity test statistic can be extended to deal with ordinal or censored data as well as to test for the homogeneity of more than two samples. We also provide a detailed analysis of the power of the proposed test statistic (in the two sample case) against asymptotic local shift alternatives. The second contribution of the paper concerns the use of the proposed test statistic to perform retrospective change-point detection. It is first shown that the approach is computationally feasible even when looking for a large number of change-points thanks to the use of dynamic programming. Computable asymptotic -values for the test are available in the case where a single potential change-point is to be detected. The proposed approach is particularly recommendable in situations where the correlations between the coordinates of the data are moderate, the marginal distributions are not well modelled by usual parametric assumptions (e.g., in the presence of outliers) and when faced with highly variable change patterns, for instance, if the potential changes only affect subsets of the coordinates of the data.
Mot clés : détection de ruptures, test d’homogénéité, test de Kruskal-Wallis, test de Mann-Whitney/Wilcoxon, données multivariées, statistiques de rang
@article{JSFS_2015__156_4_133_0, author = {Lung-Yut-Fong, Alexandre and L\'evy-Leduc, C\'eline and Capp\'e, Olivier}, title = {Homogeneity and change-point detection tests for multivariate data using rank statistics}, journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique}, pages = {133--162}, publisher = {Soci\'et\'e fran\c{c}aise de statistique}, volume = {156}, number = {4}, year = {2015}, zbl = {1338.62134}, language = {en}, url = {http://archive.numdam.org/item/JSFS_2015__156_4_133_0/} }
TY - JOUR AU - Lung-Yut-Fong, Alexandre AU - Lévy-Leduc, Céline AU - Cappé, Olivier TI - Homogeneity and change-point detection tests for multivariate data using rank statistics JO - Journal de la société française de statistique PY - 2015 SP - 133 EP - 162 VL - 156 IS - 4 PB - Société française de statistique UR - http://archive.numdam.org/item/JSFS_2015__156_4_133_0/ LA - en ID - JSFS_2015__156_4_133_0 ER -
%0 Journal Article %A Lung-Yut-Fong, Alexandre %A Lévy-Leduc, Céline %A Cappé, Olivier %T Homogeneity and change-point detection tests for multivariate data using rank statistics %J Journal de la société française de statistique %D 2015 %P 133-162 %V 156 %N 4 %I Société française de statistique %U http://archive.numdam.org/item/JSFS_2015__156_4_133_0/ %G en %F JSFS_2015__156_4_133_0
Lung-Yut-Fong, Alexandre; Lévy-Leduc, Céline; Cappé, Olivier. Homogeneity and change-point detection tests for multivariate data using rank statistics. Journal de la société française de statistique, Tome 156 (2015) no. 4, pp. 133-162. http://archive.numdam.org/item/JSFS_2015__156_4_133_0/
[1] Non-parametric statistical diagnosis, Kluwer Academic Publishers, Dordrecht, 2000 | Zbl
[2] On the approximation of curves by line segments using dynamic programming, Communications of the ACM, Volume 4 (1961) no. 6, pp. 284-286 | DOI | Zbl
[3] Product partition models for change point models, Ann. Statist., Volume 20 (1992), pp. 260-279 | Zbl
[4] On some asymptotically nonparametric competitors of Hotelling’s ., Ann. Math. Stat., Volume 36 (1965), pp. 160-173 | Zbl
[5] Detection of Abrupt Changes: Theory and Applications, Prentice-Hall, 1993
[6] Computation and Analysis of Multiple Structural Change Models, Journal of Applied Econometrics, Volume 18 (2003) no. 1, pp. 1-22
[7] The group fused Lasso for multiple change-point detection (2011) (Technical report arXiv:1106.4199)
[8] Parametric statistical change point analysis, Birkhäuser, Boston, MA, 2000, viii+184 pages | MR | Zbl
[9] Limit theorems in change-point analysis, Wiley, New-York, 1997 | Zbl
[10] An Online Kernel Change Detection Algorithm, IEEE Trans. Signal Process., Volume 53 (2005) no. 8, pp. 2961-2974 | Zbl
[11] Critical values and -values of Bessel process distributions: computation and application to structural break tests, Econometric Theory, Volume 19 (2003) no. 6, pp. 1128-1143 | MR | Zbl
[12] Exact and Efficient Bayesian inference for Multiple Changepoint Problems, Statist. Comput., Volume 16 (2006), pp. 203-213
[13] A Kernel Method for the Two-Sample-Problem, Advances in Neural Information Processing Systems (2006), pp. 513-520
[14] Kernel change-point analysis, Advances in Neural Information Processing Systems (2008)
[15] Retrospective Multiple Change-point Estimation with Kernels, IEEE Workshop on Statistical Signal Processing (2007)
[16] Multiple change-point estimation with a total variation penalty, J. Amer. Statist. Assoc., Volume 105 (2010) no. 492, pp. 1480-1493 | Zbl
[17] Affine invariant multivariate rank tests for several samples, Statist. Sinica, Volume 8 (1998) no. 3, pp. 785-800 | MR | Zbl
[18] Nonparametric multivariate rank tests and their unbiasedness., Bernoulli, Volume 18 (2012) no. 1, pp. 229-251 | Zbl
[19] Fundamentals of statistical signal processing: detection theory, Prentice-Hall, Inc., 1993 | Zbl
[20] -sample analogues of the Kolmogorov-Smirnov and Cramér-V. Mises tests, Ann. Math. Statist., Volume 30 (1959), pp. 420-447 | MR | Zbl
[21] Ordinal Measures of Association, J. Amer. Statist. Assoc., Volume 53 (1958) no. 284, pp. 814-861 | MR | Zbl
[22] Using penalized contrasts for the change-points problems, Signal Processing, Volume 85 (2005) no. 8, pp. 1501-1510 | Zbl
[23] Nonparametrics: statistical methods based on ranks, Holden-Day Inc., 1975, xvi+457 pages | MR | Zbl
[24] Distributed detection/localization of change-points in high-dimensional network traffic data, Statist. Comput., Volume 22 (2012) no. 12, pp. 485-496 | arXiv | MR | Zbl
[25] Detection and localization of change-points in high-dimensional network traffic data, Ann. Applied Statist., Volume 3 (2009) no. 2, pp. 637-662 | MR | Zbl
[26] On the efficiency of multivariate spatial sign and rank tests, Ann. Statist., Volume 25 (1997) no. 2, pp. 542-552 | MR | Zbl
[27] Affine invariant multivariate sign and rank tests and corresponding estimates: a review, Scand. J. Statist., Volume 26 (1999) no. 3, pp. 319-343 | MR | Zbl
[28] Continuous inspection schemes, Biometrika, Volume 41 (1954), pp. 100-115 | MR | Zbl
[29] Quickest Detection, Cambridge University Press, Cambridge, 2009 | MR | Zbl
[30] A statistical approach for array CGH data analysis, BMC Bioinformatics, Volume 6 (2005) | DOI
[31] Sequential analysis, Springer Series in Statistics, Springer-Verlag, New York, 1985 | MR | Zbl
[32] Likelihood ratio tests for a change in the multivariate normal mean, J. Amer. Statist. Assoc., Volume 81 (1986) no. 393, pp. 199-204 | MR | Zbl
[33] Structural learning with time-varying components: tracking the cross-section of the financial time series, J. Royal Statist. Soc. B, Volume 67 (2005) no. 3, pp. 321-341 | MR | Zbl
[34] Sequential Analysis : Hypothesis Testing and Changepoint Detection, CRC Press, Taylor & Francis Group, 2014 | MR
[35] A novel approach to detection of intrusions in computer networks via adaptive sequential and batch-sequential change-point detection methods, IEEE Trans. Signal Process., Volume 54 (2006) no. 9, pp. 3372 -3382 | Zbl
[36] Inference based on the affine invariant multivariate Mann-Whitney-Wilcoxon statistic, J. Nonparametr. Stat., Volume 15 (2003) no. 4-5, pp. 403-419 | MR | Zbl
[37] Fast detection of multiple change-points shared by many signals using group LARS, Advances in Neural Information Processing Systems 23 (2010)
[38] Asymptotic statistics, Cambridge University Press, 1998, xvi+443 pages | MR | Zbl
[39] Two-sample asymptotically distribution-free tests for incomplete multivariate observations, J. Amer. Statist. Assoc., Volume 79 (1984) no. 387, pp. 653-661 | MR | Zbl