Détection non-supervisée d’observations atypiques en contrôle de qualité : un survol
Journal de la société française de statistique, Tome 159 (2018) no. 3, pp. 1-39.

La détection d’observations atypiques ou d’anomalies est un challenge dans de nombreux domaines. Dans cet article, une revue de la littérature des méthodes non-supervisées est dressée et l’accent est principalement mis sur le contrôle de qualité. Tout d’abord il est important de noter que la notion d’anormalité retenue suit celle donnée par Hawkins (1980) , à savoir qu’une observation est atypique si elle est générée par un mécanisme différent de celui de la majorité des données. Une première section se focalise sur le contexte du contrôle de qualité dans l’industrie des composants électroniques destinés aux applications automobiles, afin d’établir un inventaire des différentes méthodes utilisées en pratique. Il apparaît que ce sont principalement des méthodes univariées qui sont intégrées aux différents processus de détection de défauts. Seules quelques méthodes multivariées de type distance de Mahalanobis ou Analyse en Composantes Principales semblent connues de quelques industriels. Les sections suivantes essaient de résumer l’ensemble de la palette de possibilités destinées à la détection d’observations atypiques de manière non-supervisée ainsi que leur mise en œuvre sous le logiciel R ( R Core Team, 2017 ). Une distinction est faite entre les méthodes ne traitant que des données en dimension standard, i.e avec plus d’observations que de variables, et celles acceptant des données en grande dimension et avec une faible taille d’échantillon.

The outlier or anomaly detection is quite a challenge in many areas. In this article, we mainly focus on quality control and we do a review of the literature of unsupervised methods. All along this work, the notion of outlyingness follows the definition given by Hawkins (1980) , namely that an observation is outlying if it is generated by a different mechanism than the one of the bulk of the data. A first section focuses on the context of quality control for the electronic components for automotive applications. It reviews all the common methods used in practice. It appears that mainly univariate methods are integrated into the fault detection processes. Only a few multivariate methods like the Mahalanobis distance or the Principal Components Analysis are used by some manufacturers. The next sections attempt to summarize all the unsupervised methods for outlier detection as well as their implementation in the R software ( R Core Team, 2017 ). A distinction is made between methods designed for standard data, i.e. with more observations than variables, and those adapted to high dimensional data with a small sampling size.

Mot clés : détection d’anomalies, analyse multivariée, faible taille d’échantillon, haute fiabilité
Keywords: anomaly detection, multivariate analysis, low sample size, high reliability
@article{JSFS_2018__159_3_1_0,
     author = {Archimbaud, Aurore},
     title = {D\'etection non-supervis\'ee d{\textquoteright}observations atypiques en contr\^ole de qualit\'e~: un survol},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {1--39},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {159},
     number = {3},
     year = {2018},
     mrnumber = {3901134},
     zbl = {1410.62213},
     language = {fr},
     url = {http://archive.numdam.org/item/JSFS_2018__159_3_1_0/}
}
TY  - JOUR
AU  - Archimbaud, Aurore
TI  - Détection non-supervisée d’observations atypiques en contrôle de qualité : un survol
JO  - Journal de la société française de statistique
PY  - 2018
SP  - 1
EP  - 39
VL  - 159
IS  - 3
PB  - Société française de statistique
UR  - http://archive.numdam.org/item/JSFS_2018__159_3_1_0/
LA  - fr
ID  - JSFS_2018__159_3_1_0
ER  - 
%0 Journal Article
%A Archimbaud, Aurore
%T Détection non-supervisée d’observations atypiques en contrôle de qualité : un survol
%J Journal de la société française de statistique
%D 2018
%P 1-39
%V 159
%N 3
%I Société française de statistique
%U http://archive.numdam.org/item/JSFS_2018__159_3_1_0/
%G fr
%F JSFS_2018__159_3_1_0
Archimbaud, Aurore. Détection non-supervisée d’observations atypiques en contrôle de qualité : un survol. Journal de la société française de statistique, Tome 159 (2018) no. 3, pp. 1-39. http://archive.numdam.org/item/JSFS_2018__159_3_1_0/

[1] Automotive Electronic Council Guidelines for part average testing, AEC-Q001, rev-D (2011)

[2] Agyemang, Malik; Barker, Ken; Alhajj, Rada A comprehensive survey of numeric and symbolic outlier mining techniques, Intelligent Data Analysis, Volume 10 (2006) no. 6, pp. 521-538

[3] An, Jinwon; Cho, Sungzoon Variational Autoencoder based Anomaly Detection using Reconstruction Probability, SNU Data Mining Center - Special Lecture on IE (2015)

[4] Aggarwal, Charu C. Outlier Analysis, Springer Publishing Company, Incorporated, 2013 | MR | Zbl

[5] Aggarwal, Charu C. Outlier Analysis, 2nd edition, Springer Publishing Company, Incorporated, 2017 | MR

[6] Aggarwal, Charu C; Hinneburg, Alexander; Keim, Daniel A On the surprising behavior of distance metrics in high dimensional space, International Conference on Database Theory, Springer (2001), pp. 420-434 | Zbl

[7] Alashwali, Fatimah; Kent, John The use of a common location measure in the invariant coordinate selection and projection pursuit, Journal of Multivariate Analysis, Volume 152 (2016), pp. 145-161 | MR | Zbl

[8] Archimbaud, Aurore; May, Joris; Nordhausen, Klaus; Ruiz-Gazen, Anne ICSShiny : Invariant Coordinate Selection With a Shiny App (2017) (R package version 0.5)

[9] Archimbaud, A.; Nordhausen, K.; Ruiz-Gazen, A. ICS for Multivariate Outlier Detection with Application to Quality Control, Computational Statistics & Data Analysis, Volume 128 (2018), pp. 184 -199 | MR | Zbl

[10] Archimbaud, Aurore; Nordhausen, Klaus; Ruiz-Gazen, Anne ICSOutlier : Outlier Detection Using Invariant Coordinate Selection (2016) (R package version 0.3-0)

[11] Archimbaud, Aurore; Nordhausen, Klaus; Ruiz-Gazen, Anne ICSOutlier : Unsupervised Outlier Detection for Low- Dimensional Contamination Structure, The R Journal, Volume 10 (2018) no. 1, pp. 234-250 https://journal.r-project.org/archive/2018/RJ-2018-034/index.html

[12] Archimbaud, Aurore Statistical methods for outlier detection for high-dimensional data, Université Toulouse 1 Capitole (2018) (Ph. D. Thesis)

[13] Aggarwal, Charu C; Sathe, Saket Outlier Ensembles : An Introduction, Springer, 2017 | MR

[14] Aggarwal, Charu C; Yu, Philip S Outlier detection for high dimensional data, ACM Sigmod Record, ACM (2001), pp. 37-46

[15] Beckman, Richard J; Cook, R Dennis Outliers, Technometrics, Volume 25 (1983) no. 2, pp. 119-149 | MR | Zbl

[16] Becker, Claudia; Gather, Ursula The masking breakdown point of multivariate outlier identification rules, Journal of the American Statistical Association, Volume 94 (1999) no. 447, pp. 947-955 | MR | Zbl

[17] Beyer, Kevin; Goldstein, Jonathan; Ramakrishnan, Raghu; Shaft, Uri When is “nearest neighbor” meaningful ?, International Conference on Database Theory, Springer (1999), pp. 217-235

[18] Billor, Nedret; Hadi, Ali S; Velleman, Paul F BACON : blocked adaptive computationally efficient outlier nominators, Computational Statistics & Data Analysis, Volume 34 (2000) no. 3, pp. 279-298 | Zbl

[19] Breunig, Markus M; Kriegel, Hans-Peter; Ng, Raymond T; Sander, Jörg LOF : identifying density-based local outliers, ACM Sigmod Record, ACM (2000), pp. 93-104

[20] Breunig, Markus M; Kriegel, Hans-Peter; Ng, Raymond T; Sander, Jörg Optics-of : Identifying local outliers, European Conference on Principles of Data Mining and Knowledge Discovery, Springer (1999), pp. 262-270

[21] Barnett, V; Lewis, T Outliers in Statistical Data, Wiley, 1994 | MR | Zbl

[22] Bookstein, Fred L; Mitteroecker, Philipp Comparing covariance matrices by relative eigenanalysis, with applications to organismal biology, Evolutionary Biology, Volume 41 (2014) no. 2, pp. 336-350

[23] Branco, J. A.; Pires, A. M. High dimensionality : the trouble with Mahalanobis distance (2015) http://mcs.open.ac.uk/statistics_images/ProgrammeAbstractsWOMAT.pdf (WOMAT : Workshop On Multivariate Analysis Today)

[24] Bay, Stephen D; Schwabacher, Mark Mining distance-based outliers in near linear time with randomization and a simple pruning rule, Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM (2003), pp. 29-38

[25] Bernard, Anne; Saporta, Gilbert Analyse en Composantes Principales Sparse pour données multiblocs et extensiona l’Analyse des Correspondances Multiples Sparse, 45emes Journées de Statistique (2013)

[26] Campbell, Norm A Robust procedures in multivariate analysis I : Robust covariance estimation, Applied Statistics, Volume 29 (1980) no. 3, pp. 231-237 | Zbl

[27] Chandola, Varun; Banerjee, Arindam; Kumar, Vipin Outlier detection : A survey (2007) (Technical report)

[28] Chandola, Varun; Banerjee, Arindam; Kumar, Vipin Anomaly detection : A survey, ACM Computing Surveys (CSUR), Volume 41 (2009) no. 3 | DOI

[29] Cerioli, Andrea Multivariate outlier detection with high-breakdown estimators, Journal of the American Statistical Association, Volume 105 (2010) no. 489, pp. 147-156 | MR | Zbl

[30] Chalmers, R. Philip; Flora, David B. faoutlier : An R Package for Detecting Influential Cases in Exploratory and Confirmatory Factor Analysis, Applied Psychological Measurement, Volume 39 (2015) no. 7, pp. 573-574

[31] Croux, Christophe; Filzmoser, Peter; Fritz, Heinrich Robust sparse principal component analysis, Technometrics, Volume 55 (2013) no. 2, pp. 202-214 | MR

[32] Caussinus, H; Fekri, M; Hakam, S; Ruiz-Gazen, Anne A monitoring display of multivariate outliers, Computational Statistics & Data Analysis, Volume 44 (2003) no. 1, pp. 237-252 | MR | Zbl

[33] Croux, Christophe; Filzmoser, Peter; Oliveira, Maria Rosario Algorithms for Projection-Pursuit robust principal component analysis, Chemometrics and Intelligent Laboratory Systems, Volume 87 (2007) no. 2, pp. 218-225

[34] Cardot, Hervé; Godichon, Antoine Robust Principal Components Analysis based on the Median Covariation Matrix, arXiv preprint arXiv :1504.02852 (2015)

[35] Candès, Emmanuel J; Li, Xiaodong; Ma, Yi; Wright, John Robust principal component analysis ?, Journal of the ACM (JACM), Volume 58 (2011) no. 3 | DOI | MR | Zbl

[36] Cerioli, Andrea; Riani, Marco; Atkinson, Anthony C Controlling the size of multivariate outlier tests with the MCD estimator of scatter, Statistics and Computing, Volume 19 (2009) no. 3, pp. 341-353 | MR

[37] Croux, Christophe; Ruiz-Gazen, Anne High breakdown estimators for principal components : the projection-pursuit approach revisited, Journal of Multivariate Analysis, Volume 95 (2005) no. 1, pp. 206-226 | MR | Zbl

[38] Caussinus, H; Ruiz-Gazen, A Interesting projections of multidimensional data by means of generalized principal component analyses, Proceedings of COMPSTAT’1990, Springer (1990), pp. 121-126

[39] Cinar, Ali; Undey, Cenk Statistical process and controller performance monitoring. A tutorial on current methods and future directions, Proceedings of the American Control Conference, Volume 4, IEEE (1999), pp. 2625-2639

[40] Cateni, Silvia; Vannucci, Marco; Colla, Valentina Outlier detection methods for industrial applications, INTECH Open Access Publisher, 2008

[41] Campos, Guilherme O; Zimek, Arthur; Sander, Jörg; Campello, Ricardo JGB; Micenková, Barbora; Schubert, Erich; Assent, Ira; Houle, Michael E On the evaluation of unsupervised outlier detection : measures, datasets, and an empirical study, Data Mining and Knowledge Discovery (2015), pp. 1-37 | MR

[42] Dang, Xuan Hong; Assent, Ira; Ng, Raymond T; Zimek, Arthur; Schubert, Erich Discriminative features for identifying and interpreting outliers, IEEE 30th International Conference on Data Engineering (ICDE), IEEE (2014), pp. 88-99

[43] Dutta, Haimonti; Giannella, Chris; Borne, Kirk; Kargupta, Hillol Distributed top-k outlier detection from astronomy catalogs using the demac system, Proceedings of the 2007 SIAM International Conference on Data Mining, SIAM (2007), pp. 473-478

[44] Devlin, Susan J; Gnanadesikan, Ramanathan; Kettenring, Jon R Robust estimation of dispersion matrices and principal components, Journal of the American Statistical Association, Volume 76 (1981) no. 374, pp. 354-362 | Zbl

[45] Debruyne, Michiel; Hubert, Mia The influence function of the Stahel-Donoho covariance estimator of smallest outlyingness, Statistics & Probability Letters, Volume 79 (2009) no. 3, pp. 275-282 | MR | Zbl

[46] Duda, Richard O; Hart, Peter E; Stork, David G Pattern classification, John Wiley & Sons, 2012 | MR | Zbl

[47] Debruyne, M.; Höppner, S.; Serneels, S.; Verdonck, T. Outlyingness : why do outliers lie out ?, arXiv preprint arXiv :1708.03761v1 (2017)

[48] Dempster, Arthur P; Laird, Nan M; Rubin, Donald B Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society. Series B (Methodological), Volume 2 (1977), pp. 1-38 | MR | Zbl

[49] Droesbeke, Jean-Jacques; Saporta, Gilbert; Thomas-Agnan, Christine Méthodes robustes en statistique, 2015, 206 pages

[50] Engelen, S; Hubert, Mia; Branden, K Vanden A comparison of three procedures for robust PCA in high dimensions, Austrian Journal of Statistics, Volume 34 (2005) no. 2, pp. 117-126

[51] Fan, Cheng HighDimOut : Outlier Detection Algorithms for High-Dimensional Data (2015) https://CRAN.R-project.org/package=HighDimOut (R package version 1.0.0)

[52] Fischer, Daniel; Berro, Alain; Nordhausen, Klaus; Ruiz-Gazen, Anne REPPlab : R Interface to ’EPP-Lab’, a Java Program for Exploratory Projection Pursuit (2016) https://CRAN.R-project.org/package=REPPlab (R package version 0.9.4)

[53] Filzmoser, Peter; Gschwandtner, Moritz mvoutlier : Multivariate outlier detection based on robust methods (2015) https://CRAN.R-project.org/package=mvoutlier (R package version 2.0.6)

[54] Farcomeni, Alessio; Greco, Luca Robust methods for data reduction, CRC press, 2016

[55] Filzmoser, Peter; Garrett, Robert G; Reimann, Clemens Multivariate outlier detection in exploration geochemistry, Computers & Geosciences, Volume 31 (2005) no. 5, pp. 579-587

[56] Fischer, Daniel; Honkatukia, Mervi; Tuiskula-Haavisto, Maria; Nordhausen, Klaus; Cavero, David; Preisinger, Rudolf; Vilkki, Johanna Subgroup detection in genotype data using invariant coordinate selection, BMC Bioinformatics, Volume 18 (2017) no. 1

[57] Filzmoser, Peter; Maronna, Ricardo; Werner, Mark Outlier identification in high dimensions, Computational Statistics & Data Analysis, Volume 52 (2008) no. 3, pp. 1694-1711 | MR | Zbl

[58] Filzmoser, Peter; Todorov, Valentin Review of robust multivariate statistical methods in high dimension, Analytica Chimica Acta, Volume 705 (2011) no. 1, pp. 2-14

[59] Filzmoser, Peter; Todorov, Valentin Robust tools for the imperfect world, Information Sciences, Volume 245 (2013), pp. 4-20 | MR | Zbl

[60] Friedman, Jerome H; Tukey, John W A projection pursuit algorithm for exploratory data analysis, IEEE Transactions on Computers, Volume 100 (1974) no. 9, pp. 881-890 | Zbl

[61] Fujimaki, Ryohei; Yairi, Takehisa; Machida, Kazuo An approach to spacecraft anomaly detection problem using kernel feature space, Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, ACM (2005), pp. 401-410

[62] Geun Kim, Myung Multivariate outliers and decompositions of Mahalanobis distance, Communications in Statistics-Theory and Methods, Volume 29 (2000) no. 7, pp. 1511-1526 | MR | Zbl

[63] Gnanadesikan, Ramanathan; Kettenring, John R Robust estimates, residuals, and outlier detection with multiresponse data, Biometrics, Volume 28 (1972) no. 1, pp. 81-124

[64] Green, Christopher G.; Martin, Doug CerioliOutlierDetection : Outlier Detection Using the Iterated RMCD Method of Cerioli (2010) (2017) https://CRAN.R-project.org/package=CerioliOutlierDetection (R package version 1.1.9)

[65] Green, Christopher G; Martin, R Douglas An extension of a method of Hardin and Rocke, with an application to multivariate outlier detection via the IRMCD method of Cerioli (2017) (Technical report)

[66] Genest, Maxime; Masse, Jean-Claude; Plante, Jean-Francois depth : Depth functions tools for multivariate analysis (2012) http://CRAN.R-project.org/package=depth (R package version 2.0-0)

[67] Ghoting, Amol; Parthasarathy, Srinivasan; Otey, Matthew Eric Fast mining of distance-based outliers in high-dimensional datasets, Proceedings of the 2006 SIAM International Conference on Data Mining, SIAM (2006), pp. 609-613 | MR

[68] Grubbs, Frank E Sample criteria for testing outlying observations, The Annals of Mathematical Statistics, Volume 21 (1950) no. 1, pp. 27-58 | MR | Zbl

[69] Grubbs, Frank E Procedures for detecting outlying observations in samples, Technometrics, Volume 11 (1969) no. 1, pp. 1-21

[70] Gao, Jing; Tan, Pang-Ning Converting output scores from outlier detection algorithms into probability estimates, ICDM’06 - Sixth International Conference on Data Mining, IEEE (2006), pp. 212-221

[71] Hodge, Victoria J; Austin, Jim A survey of outlier detection methodologies, Artificial Intelligence Review, Volume 22 (2004) no. 2, pp. 85-126 | Zbl

[72] Hassan, Ali Hajj Détection multidimensionnelle au test paramétrique avec recherche automatique des causes, Université Grenoble (2014) (Ph. D. Thesis)

[73] Hawkins, Douglas M Identification of outliers, 11, Springer, 1980 | MR

[74] Hadi, Ali S; Imon, AHM; Werner, Mark Detection of outliers, Wiley Interdisciplinary Reviews : Computational Statistics, Volume 1 (2009) no. 1, pp. 57-70

[75] Harkat, Mohamed-Faouzi; Mourot, Gilles; Ragot, José Différentes méthodes de localisation de défauts basées sur les dernières composantes principales, Conférence Internationale Francophone d’Automatique (CIFA) (2002)

[76] Hu, Yingsong; Murray, Wayne; Shan, Yin; Australia Rlof : R Parallel Implementation of Local Outlier Factor (LOF) (2015) https://CRAN.R-project.org/package=Rlof (R package version 1.1.1)

[77] Holm, Sture A simple sequentially rejective multiple test procedure, Scandinavian journal of statistics (1979), pp. 65-70 | MR | Zbl

[78] Hotelling, Harold The Generalization of Student’s Ratio, The Annals of Mathematical Statistics, Volume 2 (1931) no. 3, pp. 360-378 | JFM | Zbl

[79] Hotteling, H Multivariate Quality Control Illustrated by the Air Testing of Sample Bombsites, Selected Techniques of Statistical Analysis (1947), pp. 111-184

[80] Howe, David Charles kmodR : K-Means with Simultaneous Outlier Detection (2015) http://CRAN.R-project.org/package=kmodR (R package version 0.1.0)

[81] Hardin, Johanna; Rocke, David M The distribution of robust distances, Journal of Computational and Graphical Statistics, Volume 14 (2005) no. 4, pp. 928-946 | MR

[82] Hubert, Mia; Reynkens, Tom; Schmitt, Eric; Verdonck, Tim Sparse PCA for high-dimensional data with outliers, Technometrics, Volume 58 (2016) no. 4, pp. 424-434 | MR

[83] Hubert, Mia; Rousseeuw, Peter J; Verboven, Sabine A fast method for robust principal components with applications to chemometrics, Chemometrics and Intelligent Laboratory Systems, Volume 60 (2002) no. 1, pp. 101-111

[84] Hubert, Mia; Rousseeuw, Peter J; Vanden Branden, Karlien ROBPCA : a new approach to robust principal component analysis, Technometrics, Volume 47 (2005) no. 1, pp. 64-79 | MR

[85] Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome The Elements of Statistical Learning, Springer Series in Statistics, Springer New York Inc., New York, NY, USA, 2001 | MR | Zbl

[86] Huber, Peter J Projection pursuit, The Annals of Statistics, Volume 13 (1985) no. 2, pp. 435-475 | MR | Zbl

[87] JEDEC Outlier identification and management system for electronic components (2009) (Technical report)

[88] Jensen, Willis A; Birch, Jeffrey B; Woodall, William H High breakdown estimation methods for phase I multivariate control charts, Quality and Reliability Engineering International, Volume 23 (2007) no. 5, pp. 615-629

[89] Jain, Anil K; Dubes, Richard C Algorithms for clustering data, Prentice-Hall, Inc., 1988 | MR | Zbl

[90] Jimenez, Jose abodOutlier : Angle-Based Outlier Detection (2015) https://CRAN.R-project.org/package=abodOutlier (R package version 0.1)

[91] Jolliffe, Ian Principal component analysis, Wiley Online Library, 2002 | MR

[92] Jobe, J Marcus; Pokojovy, Michael A cluster-based outlier detection scheme for multivariate data, Journal of the American Statistical Association, Volume 110 (2015) no. 512, pp. 1543-1551 | MR | Zbl

[93] Josse, Julie; Sardy, Sylvain Adaptive shrinkage of singular values, Statistics and Computing, Volume 26 (2016) no. 3, pp. 715-724 | MR | Zbl

[94] Josse, Julie; Sardy, Sylvain; Wager, Stefan denoiseR : A Package for Low Rank Matrix Estimation, arXiv preprint arXiv :1602.01206 (2016)

[95] Josse, Julie; Sardy, Sylvain; Wager, Stefan denoiseR : Regularized Low Rank Matrix Estimation (2016) https://CRAN.R-project.org/package=denoiseR (R package version 1.0)

[96] Johnson, Richard A.; Wichern, Dean W. Applied Multivariate Statistical Analysis (6th Edition), Prentice Hall, 1998 | MR | Zbl

[97] Kwitt, Roland; Hofmann, Ulrich Robust Methods for Unsupervised PCA-based Anomaly Detection, Proc. of IEEE/IST WorNshop on Monitoring, AttacN Detection and Mitigation (2006), pp. 1-3

[98] Kriegel, Hans-Peter; Kröger, Peer; Schubert, Erich; Zimek, Arthur Outlier detection in axis-parallel subspaces of high dimensional data, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer (2009), pp. 831-838

[99] Kriegel, Hans-Peter; Kroger, Peer; Schubert, Erich; Zimek, Arthur Interpreting and unifying outlier scores, Proceedings of the 2011 SIAM International Conference on Data Mining, SIAM (2011), pp. 13-24

[100] Kriegel, Hans-Peter; Kroger, Peer; Schubert, Erich; Zimek, Arthur Outlier detection in arbitrarily oriented subspaces, IEEE 12th International Conference on Data Mining (ICDM), IEEE (2012), pp. 379-388

[101] Kriegel, Hans-Peter; Kröger, Peer; Zimek, Arthur Outlier detection techniques, Tutorial at KDD, Volume 10 (2010)

[102] Keller, Fabian; Muller, Emmanuel; Bohm, Klemens HiCS : High contrast subspaces for density-based outlier ranking, IEEE 28th International Conference on Data Engineering (ICDE), IEEE (2012), pp. 1037-1048

[103] Knorr, Edwin M; Ng, Raymond T Algorithms for mining distancebased outliers in large datasets, Proceedings of the International Conference on Very Large Data Bases, Citeseer (1998), pp. 392-403

[104] Knorr, Edwin M; Ng, Raymond T Finding intensional knowledge of distance-based outliers, VLDB, Volume 99 (1999), pp. 211-222

[105] Komsta, Lukasz outliers : Tests for outliers (2011) https://CRAN.R-project.org/package=outliers (R package version 0.14)

[106] Kriegel, Hans-Peter; Zimek, Arthur Angle-based outlier detection in high-dimensional data, Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM (2008), pp. 444-452

[107] Lafaye de Micheaux, Daniel Prolonger la MSP par la “Maîtrise Globale du Processus”, Qualité références (Juillet 2000), pp. 47-55

[108] Lafaye de Micheaux, Daniel; Cembrynski, Thierry; Dalancon, Thomas; Demonsant, Jacques Réduction DE LA DISPERSION DES CARACTERISTIQUES PRODUIT, Méthodologie GPC ET APPLICATION EN CARROSSERIE AUTOMOBILE, 7ième édition du Congrès International Pluridisciplinaire Qualita 2007, Tanger (Maroc) (2007)

[109] Lafaye de Micheaux, Daniel; Vieux, Didier MSP multidimensionnelle, Détecter et identifier “L’invisible”, Qualité références (Janvier 2005), pp. 79-82

[110] Laurikkala, Jorma; Juhola, Martti; Kentala, Erna; Lavrac, N; Miksch, S; Kavsek, B Informal identification of outliers in medical data, Fifth International Workshop on Intelligent Data Analysis in Medicine and Pharmacology, Volume 1 (2000), pp. 20-24

[111] Lazarevic, Aleksandar; Kumar, Vipin Feature bagging for outlier detection, Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, ACM (2005), pp. 157-166

[112] Locantore, N; Marron, JS; Simpson, DG; Tripoli, N; Zhang, JT; Cohen, KL; Boente, Graciela; Fraiman, Ricardo; Brumback, Babette; Croux, Christophe Robust principal component analysis for functional data, Test, Volume 8 (1999) no. 1, pp. 1-73 | MR | Zbl

[113] Ledoit, Olivier; Wolf, Michael A well-conditioned estimator for large-dimensional covariance matrices, Journal of Multivariate Analysis, Volume 88 (2004) no. 2, pp. 365-411 | MR | Zbl

[114] Ledoit, Olivier; Wolf, Michael Nonlinear shrinkage estimation of large-dimensional covariance matrices, The Annals of Statistics, Volume 40 (2012) no. 2 | MR | Zbl

[115] Lee, Jong-Min; Yoo, ChangKyoo; Choi, Sang Wook; Vanrolleghem, Peter A; Lee, In-Beum Nonlinear process monitoring using kernel principal component analysis, Chemical Engineering Science, Volume 59 (2004) no. 1, pp. 223-234

[116] Lee, Jong-Min; Yoo, ChangKyoo; Lee, In-Beum Statistical process monitoring with independent component analysis, Journal of Process Control, Volume 14 (2004) no. 5, pp. 467-485

[117] Muller, Emmanuel; Assent, Ira; Iglesias, Patricia; Mulle, Yvonne; Bohm, Klemens Outlier ranking via subspace analysis in multiple views of the data, IEEE 12th International Conference on Data Mining (ICDM), IEEE (2012), pp. 529-538

[118] Mnassri, Baligh; Ananou, Bouchra; Ouladsine, Mustapha; Gasnier, Franck Détection et localisation de défauts des Wafers par des approches statistiques multivariees et calcul des contributions, CIFA 2008, Conférence Internationale Francophone d’Automatique (2008)

[119] Muller, Emmanuel; Assent, Ira; Steinhausen, Uwe; Seidl, Thomas OutRank : ranking outliers in high dimensional data, ICDEW 2008, IEEE 24th International Conference on Data Engineering Workshop, IEEE (2008), pp. 600-603

[120] Mercier, Sabine; Bergeret, François Maîtrise Statistique des procédés - Principes et cas industriels, Dunod/Usine Nouvelle, 2011

[121] Mardia, Kantilal Varichand; Kent, John T; Bibby, John M Multivariate analysis, Academic press, 1979 | MR

[122] Moreno-Lizaranzu, Manuel J; Cuesta, Federico Improving electronic sensor reliability by robust outlier screening, Sensors, Volume 13 (2013) no. 10, pp. 13521-13542

[123] Maronna, RARD; Martin, R Douglas; Yohai, Victor Robust statistics, John Wiley & Sons, Chichester. ISBN, 2006 | MR | Zbl

[124] Markou, Markos; Singh, Sameer Novelty detection : a review part 1 : statistical approaches, Signal Processing, Volume 83 (2003) no. 12, pp. 2481-2497 | Zbl

[125] Müller, Emmanuel; Schiffer, Matthias; Gerwert, Patrick; Hannen, Matthias; Jansen, Timm; Seidl, Thomas SOREX : Subspace outlier ranking exploration toolkit, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer (2010), pp. 607-610

[126] Müller, Emmanuel; Schiffer, Matthias; Seidl, Thomas Adaptive outlierness for subspace outlier ranking, Proceedings of the 19th ACM International Conference on Information and Knowledge Management, ACM (2010), pp. 1629-1632

[127] Müller, Emmanuel; Schiffer, Matthias; Seidl, Thomas Statistical selection of relevant subspace projections for outlier ranking, IEEE 27th International Conference on Data Engineering (ICDE), IEEE (2011), pp. 434-445

[128] Maronna, Ricardo A; Yohai, Victor J The behavior of the Stahel-Donoho robust multivariate estimator, Journal of the American Statistical Association, Volume 90 (1995) no. 429, pp. 330-341 | MR | Zbl

[129] Maronna, Ricardo A; Zamar, Ruben H Robust estimates of location and dispersion for high-dimensional datasets, Technometrics, Volume 44 (2002) no. 4 | MR

[130] Nguyen, Hoang Vu; Ang, Hock Hee; Gopalkrishnan, Vivekanand Mining outliers with ensemble of heterogeneous detectors on random subspaces, International Conference on Database Systems for Advanced Applications, Springer (2010), pp. 368-383

[131] Nordhausen, Klaus; Oja, Hannu; Tyler, David E. Tools for Exploring Multivariate Data : The Package ICS, Journal of Statistical Software, Volume 28 (2008) no. 6, pp. 1-31 http://www.jstatsoft.org/v28/i06/

[132] Ollila, Esa; Tyler, David E Regularized-Estimators of Scatter Matrix, IEEE Transactions on Signal Processing, Volume 62 (2014) no. 22, pp. 6059-6070 | MR | Zbl

[133] Pimentel, Marco AF; Clifton, David A; Clifton, Lei; Tarassenko, Lionel A review of novelty detection, Signal Processing, Volume 99 (2014), pp. 215-249

[134] Parra, Lucas; Deco, Gustavo; Miesbach, Stefan Statistical independence and novelty detection with information preserving nonlinear maps, Neural Computation, Volume 8 (1996) no. 2, pp. 260-269

[135] Peirce, Benjamin Criterion for the rejection of doubtful observations, The Astronomical Journal, Volume 2 (1852), pp. 161-163

[136] Penny, Kay I; Jolliffe, Ian T Multivariate outlier detection applied to multiply imputed laboratory data, Statistics in Medicine, Volume 18 (1999) no. 14, pp. 1879-1895

[137] Peña, Daniel; Prieto, Francisco J Cluster identification using projections, Journal of the American Statistical Association, Volume 96 (2001) no. 456 | MR | Zbl

[138] Peña, Daniel; Prieto, Francisco J Multivariate outlier detection and robust covariance matrix estimation, Technometrics, Volume 43 (2001) no. 3 | MR

[139] Pham, Ninh; Pagh, Rasmus A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data, Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM (2012), pp. 877-885

[140] Pearson, E. S.; Sekar, C. Chandra The efficiency of statistical tools and a criterion for the rejection of outlying observations, Biometrika, Volume 28 (1936) no. 3/4, pp. 308-320 | Zbl

[141] R Core Team R : A Language and Environment for Statistical Computing (2017) https://www.R-project.org/

[142] Rousseeuw, Peter; Croux, Christophe; Todorov, Valentin; Ruckstuhl, Andreas; Salibian-Barrera, Matias; Verbeke, Tobias; Koller, Manuel; Maechler, Martin robustbase : Basic Robust Statistics (2016) http://CRAN.R-project.org/package=robustbase (R package version 0.92-6)

[143] Ruiz-Gazen, Anne; Marie-Sainte, Souad Larabi; Berro, Alain Detecting multivariate outliers using projection pursuit with particle swarm optimization, Proceedings of COMPSTAT’2010, Springer (2010), pp. 89-98 | Zbl

[144] Reynkens, Tom; Hubert, Mia; Schmitt, Eric; Verdonck, Tim Sparse PCA for high-dimensional data with outliers, Technometrics (2015) | MR

[145] Rider, Paul Reece Criteria for rejection of observations, Washington University Studies, St. Louis, 1933

[146] Rehage, Andre; Kuhnt, Sonja alphaOutlier : Obtain Alpha-Outlier Regions for Well-Known Probability Distributions (2016) https://CRAN.R-project.org/package=alphaOutlier (R package version 1.2.0)

[147] Rousseeuw, Peter J; Kaufman, L Finding Groups in Data, Wiley Online Library, 1990 | MR

[148] Rousseeuw, Peter J; Leroy, Annick M Robust regression and outlier detection, 589, John wiley & sons, 2005 | MR | Zbl

[149] Radovanović, Milos; Nanopoulos, Alexandros; Ivanović, Mirjana On the existence of obstinate results in vector space models, Proceedings of the 33rd international ACM SIGIR conference on Research and Development in Information Retrieval, ACM (2010), pp. 186-193

[150] Rocke, David M Robust control charts, Technometrics, Volume 31 (1989) no. 2, pp. 173-184 | MR

[151] Rocke, David M X_Q and R_Q Charts : Robust Control Charts, The Statistician, Volume 41 (1992) no. 1, pp. 97-104

[152] Rohlf, F James Generalization of the gap test for the detection of multivariate outliers, Biometrics, Volume 31 (1975) no. 1, pp. 93-101 | Zbl

[153] Rousseeuw, P. J. Multivariate estimation with high breakdown point, Mathematical Statistics and Applications (Grossman, W.; Pflug, G.; Vincze, I.; Wertz, W., eds.), Reidel, Dordrecht, 1986, pp. 283-297 | MR | Zbl

[154] Ruts, Ida; Rousseeuw, Peter J Computing depth contours of bivariate point clouds, Computational Statistics & Data Analysis, Volume 23 (1996) no. 1, pp. 153-168 | Zbl

[155] Ramaswamy, Sridhar; Rastogi, Rajeev; Shim, Kyuseok Efficient algorithms for mining outliers from large data sets, ACM Sigmod Record, ACM (2000), pp. 427-438

[156] Rousseeuw, Peter J; Van Zomeren, Bert C Unmasking multivariate outliers and leverage points, Journal of the American Statistical Association, Volume 85 (1990) no. 411, pp. 633-639

[157] Rocke, David M; Woodruff, David L Identification of outliers in multivariate data, Journal of the American Statistical Association, Volume 91 (1996) no. 435, pp. 1047-1061 | MR | Zbl

[158] Ro, Kwangil; Zou, Changliang; Wang, Zhaojun; Yin, Guosheng Outlier detection for high-dimensional data, Biometrika, Volume 102 (2015) no. 3, pp. 589-599 | MR | Zbl

[159] Smith, Rasheda; Bivens, Alan; Embrechts, Mark; Palagiri, Chandrika; Szymanski, Boleslaw Clustering approaches for anomaly based intrusion detection, Proceedings of Intelligent Engineering Systems Through Artificial Neural Networks (2002), pp. 579-584

[160] Schott, James R Matrix analysis for statistics, Wiley, 2005 | MR | Zbl

[161] Shyu, Mei-Ling; Chen, Shu-Ching; Sarinnapakorn, Kanoksri; Chang, LiWu A novel anomaly detection scheme based on principal component classifier (2003) (Technical report)

[162] Serfling, Robert J Approximation theorems of mathematical statistics, 162, John Wiley & Sons, 1980 | MR | Zbl

[163] Shen, Haipeng; Huang, Jianhua Z Sparse principal component analysis via regularized low rank matrix approximation, Journal of Multivariate Analysis, Volume 99 (2008) no. 6, pp. 1015-1034 | MR | Zbl

[164] She, Yiyuan; Li, Shijie; Wu, Dapeng Robust orthogonal complement principal component analysis, Journal of the American Statistical Association, Volume 111 (2016) no. 514, pp. 763-771 | MR

[165] Serfling, Robert; Mazumder, Satyaki Computationally easy outlier detection via projection pursuit with finitely many directions, Journal of Nonparametric Statistics, Volume 25 (2013) no. 2, pp. 447-461 | MR | Zbl

[166] Stahel, Werner; Maechler, Martin; others, potentially robustX : eXperimental Functionality for Robust Statistics (2013) http://CRAN.R-project.org/package=robustX (R package version 1.1-4)

[167] Singh, Karanjit; Upadhyaya, Shuchita Outlier detection : applications and techniques, International Journal of Computer Science Issues, Volume 9 (2012) no. 1, pp. 307-323

[168] Sullivan, Joe H; Woodall, William H A comparison of multivariate control charts for individual observations, Journal of Quality Technology, Volume 28 (1996) no. 4, pp. 398-408

[169] Schubert, Erich; Wojdanowski, Remigius; Zimek, Arthur; Kriegel, Hans-Peter On evaluation of outlier rankings and outlier scores, Proceedings of the 2012 SIAM International Conference on Data Mining, SIAM (2012), pp. 1047-1058

[170] Tatum, Lawrence G Robust estimation of the process standard deviation for control charts, Technometrics, Volume 39 (1997) no. 2, pp. 127-141 | MR | Zbl

[171] Tyler, David E.; Critchley, Frank; Dümbgen, Lutz; Oja, Hannu Invariant coordinate selection, Journal of the Royal Statistical Society : Series B (Statistical Methodology), Volume 71 (2009) no. 3, pp. 549-592 | MR | Zbl

[172] Tang, Jian; Chen, Zhixiang; Fu, Ada Wai-Chee; Cheung, David W Enhancing effectiveness of outlier detections for low density patterns, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer (2002), pp. 535-548 | Zbl

[173] Tellaroli, Paola; Donato, Michele A Partial Clustering Algorithm with Automatic Estimation of the Number of Clusters and Identification of Outliers (2016) https://CRAN.R-project.org/package=CrossClustering (R package version 3.0)

[174] Todorov, Valentin; Filzmoser, Peter An Object-Oriented Framework for Robust Multivariate Analysis, Journal of Statistical Software, Volume 32 (2009) no. 3, pp. 1-47 http://www.jstatsoft.org/v32/i03/

[175] Taouali, Okba; Jaffel, Ines; Lahdhiri, Hajer; Harkat, Mohamed Faouzi; Messaoud, Hassani New fault detection method based on reduced kernel principal component analysis (RKPCA), The International Journal of Advanced Manufacturing Technology, Volume 85 (2016) no. 5-8, pp. 1547-1552

[176] Todorov, Valentin rrcovHD : Robust Multivariate Methods for High Dimensional Data (2016) https://CRAN.R-project.org/package=rrcovHD (R package version 0.2-4)

[177] Torgo, L. Data Mining with R, learning with case studies, 2nd edition, Chapman and Hall/CRC, 2016 http://ltorgo.github.io/DMwR2

[178] Tao, Yufei; Xiao, Xiaokui; Zhou, Shuigeng Mining distance-based outliers from large databases in any metric space, Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM (2006), pp. 394-403

[179] Tyler, David E A note on multivariate location and scatter statistics for sparse data sets, Statistics & Probability Letters, Volume 80 (2010) no. 17, pp. 1409-1413 | MR | Zbl

[180] van der Loo, Mark extremevalues, an R package for outlier detection in univariate data (2010) http://www.github.com/markvanderloo/extremevalues (R package version 2.3)

[181] Vargas N., José Alberto Robust estimation in multivariate control charts for individual observations, Journal of Quality Technology, Volume 35 (2003) no. 4, pp. 367-376

[182] Verbanck, Marie; Josse, Julie; Husson, François Regularised PCA to denoise and visualise data, Statistics and Computing, Volume 25 (2015) no. 2, pp. 471-486 | MR | Zbl

[183] Venkatasubramanian, Venkat; Rengaswamy, Raghunathan; Yin, Kewen; Kavuri, Surya N A review of process fault detection and diagnosis : Part I : Quantitative model-based methods, Computers & Chemical Engineering, Volume 27 (2003) no. 3, pp. 293-311

[184] Wilks, S.S. Mathematical Statistics, John Wiley & Sons, 1962 | MR | Zbl

[185] Wu, Mingxi; Jermaine, Christopher Outlier detection by sampling with accuracy guarantees, Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM (2006), pp. 767-772

[186] Xiong, Liang; Chen, Xi; Schneider, Jeff Direct robust matrix factorizatoin for anomaly detection, IEEE 11th International Conference on Data Mining (ICDM), IEEE (2011), pp. 844-853

[187] Zimek, Arthur; Campello, Ricardo JGB; Sander, Jörg Data perturbation for outlier detection ensembles, Proceedings of the 26th International Conference on Scientific and Statistical Database Management, ACM (2014), 13 pages

[188] Zimek, Arthur; Campello, Ricardo JGB; Sander, Jörg Ensembles for unsupervised outlier detection : challenges and research questions a position paper, Volume 15, ACM (2014) no. 1, pp. 11-22

[189] Zimek, Arthur; Gaudet, Matthew; Campello, Ricardo JGB; Sander, Jörg Subsampling for efficient and effective unsupervised outlier detection ensembles, Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM (2013), pp. 428-436

[190] Zou, Hui; Hastie, Trevor; Tibshirani, Robert Sparse principal component analysis, Journal of Computational and Graphical Statistics, Volume 15 (2006) no. 2, pp. 265-286 | MR

[191] Zhang, Ji; Lou, Meng; Ling, Tok Wang; Wang, Hai Hos-miner : a system for detecting outlyting subspaces of high-dimensional data, Proceedings of the Thirtieth International Conference on Very Large Data Bases, Volume 30, VLDB Endowment (2004), pp. 1265-1268

[192] Zimek, Arthur; Schubert, Erich; Kriegel, Hans-Peter A survey on unsupervised outlier detection in high-dimensional numerical data, Statistical Analysis and Data Mining : The ASA Data Science Journal, Volume 5 (2012) no. 5, pp. 363-387 | MR | Zbl