La détection d’observations atypiques ou d’anomalies est un challenge dans de nombreux domaines. Dans cet article, une revue de la littérature des méthodes non-supervisées est dressée et l’accent est principalement mis sur le contrôle de qualité. Tout d’abord il est important de noter que la notion d’anormalité retenue suit celle donnée par Hawkins (1980) , à savoir qu’une observation est atypique si elle est générée par un mécanisme différent de celui de la majorité des données. Une première section se focalise sur le contexte du contrôle de qualité dans l’industrie des composants électroniques destinés aux applications automobiles, afin d’établir un inventaire des différentes méthodes utilisées en pratique. Il apparaît que ce sont principalement des méthodes univariées qui sont intégrées aux différents processus de détection de défauts. Seules quelques méthodes multivariées de type distance de Mahalanobis ou Analyse en Composantes Principales semblent connues de quelques industriels. Les sections suivantes essaient de résumer l’ensemble de la palette de possibilités destinées à la détection d’observations atypiques de manière non-supervisée ainsi que leur mise en œuvre sous le logiciel R ( R Core Team, 2017 ). Une distinction est faite entre les méthodes ne traitant que des données en dimension standard, i.e avec plus d’observations que de variables, et celles acceptant des données en grande dimension et avec une faible taille d’échantillon.
The outlier or anomaly detection is quite a challenge in many areas. In this article, we mainly focus on quality control and we do a review of the literature of unsupervised methods. All along this work, the notion of outlyingness follows the definition given by Hawkins (1980) , namely that an observation is outlying if it is generated by a different mechanism than the one of the bulk of the data. A first section focuses on the context of quality control for the electronic components for automotive applications. It reviews all the common methods used in practice. It appears that mainly univariate methods are integrated into the fault detection processes. Only a few multivariate methods like the Mahalanobis distance or the Principal Components Analysis are used by some manufacturers. The next sections attempt to summarize all the unsupervised methods for outlier detection as well as their implementation in the R software ( R Core Team, 2017 ). A distinction is made between methods designed for standard data, i.e. with more observations than variables, and those adapted to high dimensional data with a small sampling size.
Keywords: anomaly detection, multivariate analysis, low sample size, high reliability
@article{JSFS_2018__159_3_1_0, author = {Archimbaud, Aurore}, title = {D\'etection non-supervis\'ee d{\textquoteright}observations atypiques en contr\^ole de qualit\'e~: un survol}, journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique}, pages = {1--39}, publisher = {Soci\'et\'e fran\c{c}aise de statistique}, volume = {159}, number = {3}, year = {2018}, mrnumber = {3901134}, zbl = {1410.62213}, language = {fr}, url = {http://archive.numdam.org/item/JSFS_2018__159_3_1_0/} }
TY - JOUR AU - Archimbaud, Aurore TI - Détection non-supervisée d’observations atypiques en contrôle de qualité : un survol JO - Journal de la société française de statistique PY - 2018 SP - 1 EP - 39 VL - 159 IS - 3 PB - Société française de statistique UR - http://archive.numdam.org/item/JSFS_2018__159_3_1_0/ LA - fr ID - JSFS_2018__159_3_1_0 ER -
%0 Journal Article %A Archimbaud, Aurore %T Détection non-supervisée d’observations atypiques en contrôle de qualité : un survol %J Journal de la société française de statistique %D 2018 %P 1-39 %V 159 %N 3 %I Société française de statistique %U http://archive.numdam.org/item/JSFS_2018__159_3_1_0/ %G fr %F JSFS_2018__159_3_1_0
Archimbaud, Aurore. Détection non-supervisée d’observations atypiques en contrôle de qualité : un survol. Journal de la société française de statistique, Tome 159 (2018) no. 3, pp. 1-39. http://archive.numdam.org/item/JSFS_2018__159_3_1_0/
[1] Guidelines for part average testing, AEC-Q001, rev-D (2011)
[2] A comprehensive survey of numeric and symbolic outlier mining techniques, Intelligent Data Analysis, Volume 10 (2006) no. 6, pp. 521-538
[3] Variational Autoencoder based Anomaly Detection using Reconstruction Probability, SNU Data Mining Center - Special Lecture on IE (2015)
[4] Outlier Analysis, Springer Publishing Company, Incorporated, 2013 | MR | Zbl
[5] Outlier Analysis, 2nd edition, Springer Publishing Company, Incorporated, 2017 | MR
[6] On the surprising behavior of distance metrics in high dimensional space, International Conference on Database Theory, Springer (2001), pp. 420-434 | Zbl
[7] The use of a common location measure in the invariant coordinate selection and projection pursuit, Journal of Multivariate Analysis, Volume 152 (2016), pp. 145-161 | MR | Zbl
[8] ICSShiny : Invariant Coordinate Selection With a Shiny App (2017) (R package version 0.5)
[9] ICS for Multivariate Outlier Detection with Application to Quality Control, Computational Statistics & Data Analysis, Volume 128 (2018), pp. 184 -199 | MR | Zbl
[10] ICSOutlier : Outlier Detection Using Invariant Coordinate Selection (2016) (R package version 0.3-0)
[11] ICSOutlier : Unsupervised Outlier Detection for Low- Dimensional Contamination Structure, The R Journal, Volume 10 (2018) no. 1, pp. 234-250 https://journal.r-project.org/archive/2018/RJ-2018-034/index.html
[12] Statistical methods for outlier detection for high-dimensional data, Université Toulouse 1 Capitole (2018) (Ph. D. Thesis)
[13] Outlier Ensembles : An Introduction, Springer, 2017 | MR
[14] Outlier detection for high dimensional data, ACM Sigmod Record, ACM (2001), pp. 37-46
[15] Outliers, Technometrics, Volume 25 (1983) no. 2, pp. 119-149 | MR | Zbl
[16] The masking breakdown point of multivariate outlier identification rules, Journal of the American Statistical Association, Volume 94 (1999) no. 447, pp. 947-955 | MR | Zbl
[17] When is “nearest neighbor” meaningful ?, International Conference on Database Theory, Springer (1999), pp. 217-235
[18] BACON : blocked adaptive computationally efficient outlier nominators, Computational Statistics & Data Analysis, Volume 34 (2000) no. 3, pp. 279-298 | Zbl
[19] LOF : identifying density-based local outliers, ACM Sigmod Record, ACM (2000), pp. 93-104
[20] Optics-of : Identifying local outliers, European Conference on Principles of Data Mining and Knowledge Discovery, Springer (1999), pp. 262-270
[21] Outliers in Statistical Data, Wiley, 1994 | MR | Zbl
[22] Comparing covariance matrices by relative eigenanalysis, with applications to organismal biology, Evolutionary Biology, Volume 41 (2014) no. 2, pp. 336-350
[23] High dimensionality : the trouble with Mahalanobis distance (2015) http://mcs.open.ac.uk/statistics_images/ProgrammeAbstractsWOMAT.pdf (WOMAT : Workshop On Multivariate Analysis Today)
[24] Mining distance-based outliers in near linear time with randomization and a simple pruning rule, Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM (2003), pp. 29-38
[25] Analyse en Composantes Principales Sparse pour données multiblocs et extensiona l’Analyse des Correspondances Multiples Sparse, 45emes Journées de Statistique (2013)
[26] Robust procedures in multivariate analysis I : Robust covariance estimation, Applied Statistics, Volume 29 (1980) no. 3, pp. 231-237 | Zbl
[27] Outlier detection : A survey (2007) (Technical report)
[28] Anomaly detection : A survey, ACM Computing Surveys (CSUR), Volume 41 (2009) no. 3 | DOI
[29] Multivariate outlier detection with high-breakdown estimators, Journal of the American Statistical Association, Volume 105 (2010) no. 489, pp. 147-156 | MR | Zbl
[30] faoutlier : An R Package for Detecting Influential Cases in Exploratory and Confirmatory Factor Analysis, Applied Psychological Measurement, Volume 39 (2015) no. 7, pp. 573-574
[31] Robust sparse principal component analysis, Technometrics, Volume 55 (2013) no. 2, pp. 202-214 | MR
[32] A monitoring display of multivariate outliers, Computational Statistics & Data Analysis, Volume 44 (2003) no. 1, pp. 237-252 | MR | Zbl
[33] Algorithms for Projection-Pursuit robust principal component analysis, Chemometrics and Intelligent Laboratory Systems, Volume 87 (2007) no. 2, pp. 218-225
[34] Robust Principal Components Analysis based on the Median Covariation Matrix, arXiv preprint arXiv :1504.02852 (2015)
[35] Robust principal component analysis ?, Journal of the ACM (JACM), Volume 58 (2011) no. 3 | DOI | MR | Zbl
[36] Controlling the size of multivariate outlier tests with the MCD estimator of scatter, Statistics and Computing, Volume 19 (2009) no. 3, pp. 341-353 | MR
[37] High breakdown estimators for principal components : the projection-pursuit approach revisited, Journal of Multivariate Analysis, Volume 95 (2005) no. 1, pp. 206-226 | MR | Zbl
[38] Interesting projections of multidimensional data by means of generalized principal component analyses, Proceedings of COMPSTAT’1990, Springer (1990), pp. 121-126
[39] Statistical process and controller performance monitoring. A tutorial on current methods and future directions, Proceedings of the American Control Conference, Volume 4, IEEE (1999), pp. 2625-2639
[40] Outlier detection methods for industrial applications, INTECH Open Access Publisher, 2008
[41] On the evaluation of unsupervised outlier detection : measures, datasets, and an empirical study, Data Mining and Knowledge Discovery (2015), pp. 1-37 | MR
[42] Discriminative features for identifying and interpreting outliers, IEEE 30th International Conference on Data Engineering (ICDE), IEEE (2014), pp. 88-99
[43] Distributed top-k outlier detection from astronomy catalogs using the demac system, Proceedings of the 2007 SIAM International Conference on Data Mining, SIAM (2007), pp. 473-478
[44] Robust estimation of dispersion matrices and principal components, Journal of the American Statistical Association, Volume 76 (1981) no. 374, pp. 354-362 | Zbl
[45] The influence function of the Stahel-Donoho covariance estimator of smallest outlyingness, Statistics & Probability Letters, Volume 79 (2009) no. 3, pp. 275-282 | MR | Zbl
[46] Pattern classification, John Wiley & Sons, 2012 | MR | Zbl
[47] Outlyingness : why do outliers lie out ?, arXiv preprint arXiv :1708.03761v1 (2017)
[48] Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society. Series B (Methodological), Volume 2 (1977), pp. 1-38 | MR | Zbl
[49] Méthodes robustes en statistique, 2015, 206 pages
[50] A comparison of three procedures for robust PCA in high dimensions, Austrian Journal of Statistics, Volume 34 (2005) no. 2, pp. 117-126
[51] HighDimOut : Outlier Detection Algorithms for High-Dimensional Data (2015) https://CRAN.R-project.org/package=HighDimOut (R package version 1.0.0)
[52] REPPlab : R Interface to ’EPP-Lab’, a Java Program for Exploratory Projection Pursuit (2016) https://CRAN.R-project.org/package=REPPlab (R package version 0.9.4)
[53] mvoutlier : Multivariate outlier detection based on robust methods (2015) https://CRAN.R-project.org/package=mvoutlier (R package version 2.0.6)
[54] Robust methods for data reduction, CRC press, 2016
[55] Multivariate outlier detection in exploration geochemistry, Computers & Geosciences, Volume 31 (2005) no. 5, pp. 579-587
[56] Subgroup detection in genotype data using invariant coordinate selection, BMC Bioinformatics, Volume 18 (2017) no. 1
[57] Outlier identification in high dimensions, Computational Statistics & Data Analysis, Volume 52 (2008) no. 3, pp. 1694-1711 | MR | Zbl
[58] Review of robust multivariate statistical methods in high dimension, Analytica Chimica Acta, Volume 705 (2011) no. 1, pp. 2-14
[59] Robust tools for the imperfect world, Information Sciences, Volume 245 (2013), pp. 4-20 | MR | Zbl
[60] A projection pursuit algorithm for exploratory data analysis, IEEE Transactions on Computers, Volume 100 (1974) no. 9, pp. 881-890 | Zbl
[61] An approach to spacecraft anomaly detection problem using kernel feature space, Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, ACM (2005), pp. 401-410
[62] Multivariate outliers and decompositions of Mahalanobis distance, Communications in Statistics-Theory and Methods, Volume 29 (2000) no. 7, pp. 1511-1526 | MR | Zbl
[63] Robust estimates, residuals, and outlier detection with multiresponse data, Biometrics, Volume 28 (1972) no. 1, pp. 81-124
[64] CerioliOutlierDetection : Outlier Detection Using the Iterated RMCD Method of Cerioli (2010) (2017) https://CRAN.R-project.org/package=CerioliOutlierDetection (R package version 1.1.9)
[65] An extension of a method of Hardin and Rocke, with an application to multivariate outlier detection via the IRMCD method of Cerioli (2017) (Technical report)
[66] depth : Depth functions tools for multivariate analysis (2012) http://CRAN.R-project.org/package=depth (R package version 2.0-0)
[67] Fast mining of distance-based outliers in high-dimensional datasets, Proceedings of the 2006 SIAM International Conference on Data Mining, SIAM (2006), pp. 609-613 | MR
[68] Sample criteria for testing outlying observations, The Annals of Mathematical Statistics, Volume 21 (1950) no. 1, pp. 27-58 | MR | Zbl
[69] Procedures for detecting outlying observations in samples, Technometrics, Volume 11 (1969) no. 1, pp. 1-21
[70] Converting output scores from outlier detection algorithms into probability estimates, ICDM’06 - Sixth International Conference on Data Mining, IEEE (2006), pp. 212-221
[71] A survey of outlier detection methodologies, Artificial Intelligence Review, Volume 22 (2004) no. 2, pp. 85-126 | Zbl
[72] Détection multidimensionnelle au test paramétrique avec recherche automatique des causes, Université Grenoble (2014) (Ph. D. Thesis)
[73] Identification of outliers, 11, Springer, 1980 | MR
[74] Detection of outliers, Wiley Interdisciplinary Reviews : Computational Statistics, Volume 1 (2009) no. 1, pp. 57-70
[75] Différentes méthodes de localisation de défauts basées sur les dernières composantes principales, Conférence Internationale Francophone d’Automatique (CIFA) (2002)
[76] Rlof : R Parallel Implementation of Local Outlier Factor (LOF) (2015) https://CRAN.R-project.org/package=Rlof (R package version 1.1.1)
[77] A simple sequentially rejective multiple test procedure, Scandinavian journal of statistics (1979), pp. 65-70 | MR | Zbl
[78] The Generalization of Student’s Ratio, The Annals of Mathematical Statistics, Volume 2 (1931) no. 3, pp. 360-378 | JFM | Zbl
[79] Multivariate Quality Control Illustrated by the Air Testing of Sample Bombsites, Selected Techniques of Statistical Analysis (1947), pp. 111-184
[80] kmodR : K-Means with Simultaneous Outlier Detection (2015) http://CRAN.R-project.org/package=kmodR (R package version 0.1.0)
[81] The distribution of robust distances, Journal of Computational and Graphical Statistics, Volume 14 (2005) no. 4, pp. 928-946 | MR
[82] Sparse PCA for high-dimensional data with outliers, Technometrics, Volume 58 (2016) no. 4, pp. 424-434 | MR
[83] A fast method for robust principal components with applications to chemometrics, Chemometrics and Intelligent Laboratory Systems, Volume 60 (2002) no. 1, pp. 101-111
[84] ROBPCA : a new approach to robust principal component analysis, Technometrics, Volume 47 (2005) no. 1, pp. 64-79 | MR
[85] The Elements of Statistical Learning, Springer Series in Statistics, Springer New York Inc., New York, NY, USA, 2001 | MR | Zbl
[86] Projection pursuit, The Annals of Statistics, Volume 13 (1985) no. 2, pp. 435-475 | MR | Zbl
[87] Outlier identification and management system for electronic components (2009) (Technical report)
[88] High breakdown estimation methods for phase I multivariate control charts, Quality and Reliability Engineering International, Volume 23 (2007) no. 5, pp. 615-629
[89] Algorithms for clustering data, Prentice-Hall, Inc., 1988 | MR | Zbl
[90] abodOutlier : Angle-Based Outlier Detection (2015) https://CRAN.R-project.org/package=abodOutlier (R package version 0.1)
[91] Principal component analysis, Wiley Online Library, 2002 | MR
[92] A cluster-based outlier detection scheme for multivariate data, Journal of the American Statistical Association, Volume 110 (2015) no. 512, pp. 1543-1551 | MR | Zbl
[93] Adaptive shrinkage of singular values, Statistics and Computing, Volume 26 (2016) no. 3, pp. 715-724 | MR | Zbl
[94] denoiseR : A Package for Low Rank Matrix Estimation, arXiv preprint arXiv :1602.01206 (2016)
[95] denoiseR : Regularized Low Rank Matrix Estimation (2016) https://CRAN.R-project.org/package=denoiseR (R package version 1.0)
[96] Applied Multivariate Statistical Analysis (6th Edition), Prentice Hall, 1998 | MR | Zbl
[97] Robust Methods for Unsupervised PCA-based Anomaly Detection, Proc. of IEEE/IST WorNshop on Monitoring, AttacN Detection and Mitigation (2006), pp. 1-3
[98] Outlier detection in axis-parallel subspaces of high dimensional data, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer (2009), pp. 831-838
[99] Interpreting and unifying outlier scores, Proceedings of the 2011 SIAM International Conference on Data Mining, SIAM (2011), pp. 13-24
[100] Outlier detection in arbitrarily oriented subspaces, IEEE 12th International Conference on Data Mining (ICDM), IEEE (2012), pp. 379-388
[101] Outlier detection techniques, Tutorial at KDD, Volume 10 (2010)
[102] HiCS : High contrast subspaces for density-based outlier ranking, IEEE 28th International Conference on Data Engineering (ICDE), IEEE (2012), pp. 1037-1048
[103] Algorithms for mining distancebased outliers in large datasets, Proceedings of the International Conference on Very Large Data Bases, Citeseer (1998), pp. 392-403
[104] Finding intensional knowledge of distance-based outliers, VLDB, Volume 99 (1999), pp. 211-222
[105] outliers : Tests for outliers (2011) https://CRAN.R-project.org/package=outliers (R package version 0.14)
[106] Angle-based outlier detection in high-dimensional data, Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM (2008), pp. 444-452
[107] Prolonger la MSP par la “Maîtrise Globale du Processus”, Qualité références (Juillet 2000), pp. 47-55
[108] Réduction DE LA DISPERSION DES CARACTERISTIQUES PRODUIT, Méthodologie GPC ET APPLICATION EN CARROSSERIE AUTOMOBILE, 7ième édition du Congrès International Pluridisciplinaire Qualita 2007, Tanger (Maroc) (2007)
[109] MSP multidimensionnelle, Détecter et identifier “L’invisible”, Qualité références (Janvier 2005), pp. 79-82
[110] Informal identification of outliers in medical data, Fifth International Workshop on Intelligent Data Analysis in Medicine and Pharmacology, Volume 1 (2000), pp. 20-24
[111] Feature bagging for outlier detection, Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, ACM (2005), pp. 157-166
[112] Robust principal component analysis for functional data, Test, Volume 8 (1999) no. 1, pp. 1-73 | MR | Zbl
[113] A well-conditioned estimator for large-dimensional covariance matrices, Journal of Multivariate Analysis, Volume 88 (2004) no. 2, pp. 365-411 | MR | Zbl
[114] Nonlinear shrinkage estimation of large-dimensional covariance matrices, The Annals of Statistics, Volume 40 (2012) no. 2 | MR | Zbl
[115] Nonlinear process monitoring using kernel principal component analysis, Chemical Engineering Science, Volume 59 (2004) no. 1, pp. 223-234
[116] Statistical process monitoring with independent component analysis, Journal of Process Control, Volume 14 (2004) no. 5, pp. 467-485
[117] Outlier ranking via subspace analysis in multiple views of the data, IEEE 12th International Conference on Data Mining (ICDM), IEEE (2012), pp. 529-538
[118] Détection et localisation de défauts des Wafers par des approches statistiques multivariees et calcul des contributions, CIFA 2008, Conférence Internationale Francophone d’Automatique (2008)
[119] OutRank : ranking outliers in high dimensional data, ICDEW 2008, IEEE 24th International Conference on Data Engineering Workshop, IEEE (2008), pp. 600-603
[120] Maîtrise Statistique des procédés - Principes et cas industriels, Dunod/Usine Nouvelle, 2011
[121] Multivariate analysis, Academic press, 1979 | MR
[122] Improving electronic sensor reliability by robust outlier screening, Sensors, Volume 13 (2013) no. 10, pp. 13521-13542
[123] Robust statistics, John Wiley & Sons, Chichester. ISBN, 2006 | MR | Zbl
[124] Novelty detection : a review part 1 : statistical approaches, Signal Processing, Volume 83 (2003) no. 12, pp. 2481-2497 | Zbl
[125] SOREX : Subspace outlier ranking exploration toolkit, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer (2010), pp. 607-610
[126] Adaptive outlierness for subspace outlier ranking, Proceedings of the 19th ACM International Conference on Information and Knowledge Management, ACM (2010), pp. 1629-1632
[127] Statistical selection of relevant subspace projections for outlier ranking, IEEE 27th International Conference on Data Engineering (ICDE), IEEE (2011), pp. 434-445
[128] The behavior of the Stahel-Donoho robust multivariate estimator, Journal of the American Statistical Association, Volume 90 (1995) no. 429, pp. 330-341 | MR | Zbl
[129] Robust estimates of location and dispersion for high-dimensional datasets, Technometrics, Volume 44 (2002) no. 4 | MR
[130] Mining outliers with ensemble of heterogeneous detectors on random subspaces, International Conference on Database Systems for Advanced Applications, Springer (2010), pp. 368-383
[131] Tools for Exploring Multivariate Data : The Package ICS, Journal of Statistical Software, Volume 28 (2008) no. 6, pp. 1-31 http://www.jstatsoft.org/v28/i06/
[132] Regularized-Estimators of Scatter Matrix, IEEE Transactions on Signal Processing, Volume 62 (2014) no. 22, pp. 6059-6070 | MR | Zbl
[133] A review of novelty detection, Signal Processing, Volume 99 (2014), pp. 215-249
[134] Statistical independence and novelty detection with information preserving nonlinear maps, Neural Computation, Volume 8 (1996) no. 2, pp. 260-269
[135] Criterion for the rejection of doubtful observations, The Astronomical Journal, Volume 2 (1852), pp. 161-163
[136] Multivariate outlier detection applied to multiply imputed laboratory data, Statistics in Medicine, Volume 18 (1999) no. 14, pp. 1879-1895
[137] Cluster identification using projections, Journal of the American Statistical Association, Volume 96 (2001) no. 456 | MR | Zbl
[138] Multivariate outlier detection and robust covariance matrix estimation, Technometrics, Volume 43 (2001) no. 3 | MR
[139] A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data, Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM (2012), pp. 877-885
[140] The efficiency of statistical tools and a criterion for the rejection of outlying observations, Biometrika, Volume 28 (1936) no. 3/4, pp. 308-320 | Zbl
[141] R : A Language and Environment for Statistical Computing (2017) https://www.R-project.org/
[142] robustbase : Basic Robust Statistics (2016) http://CRAN.R-project.org/package=robustbase (R package version 0.92-6)
[143] Detecting multivariate outliers using projection pursuit with particle swarm optimization, Proceedings of COMPSTAT’2010, Springer (2010), pp. 89-98 | Zbl
[144] Sparse PCA for high-dimensional data with outliers, Technometrics (2015) | MR
[145] Criteria for rejection of observations, Washington University Studies, St. Louis, 1933
[146] alphaOutlier : Obtain Alpha-Outlier Regions for Well-Known Probability Distributions (2016) https://CRAN.R-project.org/package=alphaOutlier (R package version 1.2.0)
[147] Finding Groups in Data, Wiley Online Library, 1990 | MR
[148] Robust regression and outlier detection, 589, John wiley & sons, 2005 | MR | Zbl
[149] On the existence of obstinate results in vector space models, Proceedings of the 33rd international ACM SIGIR conference on Research and Development in Information Retrieval, ACM (2010), pp. 186-193
[150] Robust control charts, Technometrics, Volume 31 (1989) no. 2, pp. 173-184 | MR
[151] X_Q and R_Q Charts : Robust Control Charts, The Statistician, Volume 41 (1992) no. 1, pp. 97-104
[152] Generalization of the gap test for the detection of multivariate outliers, Biometrics, Volume 31 (1975) no. 1, pp. 93-101 | Zbl
[153] Multivariate estimation with high breakdown point, Mathematical Statistics and Applications (Grossman, W.; Pflug, G.; Vincze, I.; Wertz, W., eds.), Reidel, Dordrecht, 1986, pp. 283-297 | MR | Zbl
[154] Computing depth contours of bivariate point clouds, Computational Statistics & Data Analysis, Volume 23 (1996) no. 1, pp. 153-168 | Zbl
[155] Efficient algorithms for mining outliers from large data sets, ACM Sigmod Record, ACM (2000), pp. 427-438
[156] Unmasking multivariate outliers and leverage points, Journal of the American Statistical Association, Volume 85 (1990) no. 411, pp. 633-639
[157] Identification of outliers in multivariate data, Journal of the American Statistical Association, Volume 91 (1996) no. 435, pp. 1047-1061 | MR | Zbl
[158] Outlier detection for high-dimensional data, Biometrika, Volume 102 (2015) no. 3, pp. 589-599 | MR | Zbl
[159] Clustering approaches for anomaly based intrusion detection, Proceedings of Intelligent Engineering Systems Through Artificial Neural Networks (2002), pp. 579-584
[160] Matrix analysis for statistics, Wiley, 2005 | MR | Zbl
[161] A novel anomaly detection scheme based on principal component classifier (2003) (Technical report)
[162] Approximation theorems of mathematical statistics, 162, John Wiley & Sons, 1980 | MR | Zbl
[163] Sparse principal component analysis via regularized low rank matrix approximation, Journal of Multivariate Analysis, Volume 99 (2008) no. 6, pp. 1015-1034 | MR | Zbl
[164] Robust orthogonal complement principal component analysis, Journal of the American Statistical Association, Volume 111 (2016) no. 514, pp. 763-771 | MR
[165] Computationally easy outlier detection via projection pursuit with finitely many directions, Journal of Nonparametric Statistics, Volume 25 (2013) no. 2, pp. 447-461 | MR | Zbl
[166] robustX : eXperimental Functionality for Robust Statistics (2013) http://CRAN.R-project.org/package=robustX (R package version 1.1-4)
[167] Outlier detection : applications and techniques, International Journal of Computer Science Issues, Volume 9 (2012) no. 1, pp. 307-323
[168] A comparison of multivariate control charts for individual observations, Journal of Quality Technology, Volume 28 (1996) no. 4, pp. 398-408
[169] On evaluation of outlier rankings and outlier scores, Proceedings of the 2012 SIAM International Conference on Data Mining, SIAM (2012), pp. 1047-1058
[170] Robust estimation of the process standard deviation for control charts, Technometrics, Volume 39 (1997) no. 2, pp. 127-141 | MR | Zbl
[171] Invariant coordinate selection, Journal of the Royal Statistical Society : Series B (Statistical Methodology), Volume 71 (2009) no. 3, pp. 549-592 | MR | Zbl
[172] Enhancing effectiveness of outlier detections for low density patterns, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer (2002), pp. 535-548 | Zbl
[173] A Partial Clustering Algorithm with Automatic Estimation of the Number of Clusters and Identification of Outliers (2016) https://CRAN.R-project.org/package=CrossClustering (R package version 3.0)
[174] An Object-Oriented Framework for Robust Multivariate Analysis, Journal of Statistical Software, Volume 32 (2009) no. 3, pp. 1-47 http://www.jstatsoft.org/v32/i03/
[175] New fault detection method based on reduced kernel principal component analysis (RKPCA), The International Journal of Advanced Manufacturing Technology, Volume 85 (2016) no. 5-8, pp. 1547-1552
[176] rrcovHD : Robust Multivariate Methods for High Dimensional Data (2016) https://CRAN.R-project.org/package=rrcovHD (R package version 0.2-4)
[177] Data Mining with R, learning with case studies, 2nd edition, Chapman and Hall/CRC, 2016 http://ltorgo.github.io/DMwR2
[178] Mining distance-based outliers from large databases in any metric space, Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM (2006), pp. 394-403
[179] A note on multivariate location and scatter statistics for sparse data sets, Statistics & Probability Letters, Volume 80 (2010) no. 17, pp. 1409-1413 | MR | Zbl
[180] extremevalues, an R package for outlier detection in univariate data (2010) http://www.github.com/markvanderloo/extremevalues (R package version 2.3)
[181] Robust estimation in multivariate control charts for individual observations, Journal of Quality Technology, Volume 35 (2003) no. 4, pp. 367-376
[182] Regularised PCA to denoise and visualise data, Statistics and Computing, Volume 25 (2015) no. 2, pp. 471-486 | MR | Zbl
[183] A review of process fault detection and diagnosis : Part I : Quantitative model-based methods, Computers & Chemical Engineering, Volume 27 (2003) no. 3, pp. 293-311
[184] Mathematical Statistics, John Wiley & Sons, 1962 | MR | Zbl
[185] Outlier detection by sampling with accuracy guarantees, Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM (2006), pp. 767-772
[186] Direct robust matrix factorizatoin for anomaly detection, IEEE 11th International Conference on Data Mining (ICDM), IEEE (2011), pp. 844-853
[187] Data perturbation for outlier detection ensembles, Proceedings of the 26th International Conference on Scientific and Statistical Database Management, ACM (2014), 13 pages
[188] Ensembles for unsupervised outlier detection : challenges and research questions a position paper, Volume 15, ACM (2014) no. 1, pp. 11-22
[189] Subsampling for efficient and effective unsupervised outlier detection ensembles, Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM (2013), pp. 428-436
[190] Sparse principal component analysis, Journal of Computational and Graphical Statistics, Volume 15 (2006) no. 2, pp. 265-286 | MR
[191] Hos-miner : a system for detecting outlyting subspaces of high-dimensional data, Proceedings of the Thirtieth International Conference on Very Large Data Bases, Volume 30, VLDB Endowment (2004), pp. 1265-1268
[192] A survey on unsupervised outlier detection in high-dimensional numerical data, Statistical Analysis and Data Mining : The ASA Data Science Journal, Volume 5 (2012) no. 5, pp. 363-387 | MR | Zbl