Theory of classification : a survey of some recent advances
ESAIM: Probability and Statistics, Tome 9 (2005), p. 323-375
The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results.
DOI : https://doi.org/10.1051/ps:2005018
Classification:  62G08,  60E15,  68Q32
@article{PS_2005__9__323_0,
     author = {Boucheron, St\'ephane and Bousquet, Olivier and Lugosi, G\'abor},
     title = {Theory of classification : a survey of some recent advances},
     journal = {ESAIM: Probability and Statistics},
     publisher = {EDP-Sciences},
     volume = {9},
     year = {2005},
     pages = {323-375},
     doi = {10.1051/ps:2005018},
     zbl = {1136.62355},
     mrnumber = {2182250},
     language = {en},
     url = {http://www.numdam.org/item/PS_2005__9__323_0}
}
Boucheron, Stéphane; Bousquet, Olivier; Lugosi, Gábor. Theory of classification : a survey of some recent advances. ESAIM: Probability and Statistics, Tome 9 (2005) pp. 323-375. doi : 10.1051/ps:2005018. https://www.numdam.org/item/PS_2005__9__323_0/

[1] R. Ahlswede, P. Gács and J. Körner, Bounds on conditional probabilities with applications in multi-user communication. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 34 (1976) 157-177. (correction in 39 (1977) 353-354). | Zbl 0359.94011

[2] M.A. Aizerman, E.M. Braverman and L.I. Rozonoer, The method of potential functions for the problem of restoring the characteristic of a function converter from randomly observed points. Automat. Remote Control 25 (1964) 1546-1556. | Zbl 0137.39301

[3] M.A. Aizerman, E.M. Braverman and L.I. Rozonoer, The probability problem of pattern recognition learning and the method of potential functions. Automat. Remote Control 25 (1964) 1307-1323. | Zbl 0137.37903

[4] M.A. Aizerman, E.M. Braverman and L.I. Rozonoer, Theoretical foundations of the potential function method in pattern recognition learning. Automat. Remote Control 25 (1964) 917-936. | Zbl 0151.24701

[5] M.A. Aizerman, E.M. Braverman and L.I. Rozonoer, Method of potential functions in the theory of learning machines. Nauka, Moscow (1970).

[6] H. Akaike, A new look at the statistical model identification. IEEE Trans. Automat. Control 19 (1974) 716-723. | MR 423716 | Zbl 0314.62039

[7] S. Alesker, A remark on the Szarek-Talagrand theorem. Combin. Probab. Comput. 6 (1997) 139-144. | MR 1447809 | Zbl 0898.46013

[8] N. Alon, S. Ben-David, N. Cesa-Bianchi and D. Haussler, Scale-sensitive dimensions, uniform convergence, and learnability. J. ACM 44 (1997) 615-631. | MR 1481318 | Zbl 0891.68086

[9] M. Anthony and P.L. Bartlett, Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge (1999). | MR 1741038 | Zbl 0968.68126

[10] M. Anthony and N. Biggs, Computational Learning Theory. Cambridge Tracts in Theoretical Computer Science (30). Cambridge University Press, Cambridge (1992). | MR 1159707 | Zbl 0755.68115

[11] M. Anthony and J. Shawe-Taylor, A result of Vapnik with applications. Discrete Appl. Math. 47 (1993) 207-217. | Zbl 0801.68147

[12] A Antos, L. Devroye and L. Györfi, Lower bounds for Bayes error estimation. IEEE Trans. Pattern Anal. Machine Intelligence 21 (1999) 643-645.

[13] A. Antos, B. Kégl, T. Linder and G. Lugosi, Data-dependent margin-based generalization bounds for classification. J. Machine Learning Res. 3 (2002) 73-98. | Zbl 1088.68688

[14] A. Antos and G. Lugosi, Strong minimax lower bounds for learning. Machine Learning 30 (1998) 31-56. | Zbl 0892.68083

[15] P. Assouad, Densité et dimension. Annales de l'Institut Fourier 33 (1983) 233-282. | Numdam | Zbl 0504.60006

[16] J.-Y. Audibert and O. Bousquet, Pac-Bayesian generic chaining, in Advances in Neural Information Processing Systems 16, L. Saul, S. Thrun and B. Schölkopf Eds., Cambridge, Mass., MIT Press (2004).

[17] J.-Y. Audibert, PAC-Bayesian Statistical Learning Theory. Ph.D. Thesis, Université Paris 6, Pierre et Marie Curie (2004).

[18] K. Azuma, Weighted sums of certain dependent random variables. Tohoku Math. J. 68 (1967) 357-367. | Zbl 0178.21103

[19] Y. Baraud, Model selection for regression on a fixed design. Probability Theory and Related Fields 117 (2000) 467-493. | Zbl 0997.62027

[20] A.R. Barron, L. Birgé and P. Massart, Risks bounds for model selection via penalization. Probab. Theory Related Fields 113 (1999) 301-415. | Zbl 0946.62036

[21] A.R. Barron, Logically smooth density estimation. Technical Report TR 56, Department of Statistics, Stanford University (1985).

[22] A.R. Barron, Complexity regularization with application to artificial neural networks, in Nonparametric Functional Estimation and Related Topics, G. Roussas Ed. NATO ASI Series, Kluwer Academic Publishers, Dordrecht (1991) 561-576. | Zbl 0739.62001

[23] A.R. Barron and T.M. Cover, Minimum complexity density estimation. IEEE Trans. Inform. Theory 37 (1991) 1034-1054. | Zbl 0743.62003

[24] P. Bartlett, S. Boucheron and G. Lugosi, Model selection and error estimation. Machine Learning 48 (2001) 85-113. | Zbl 0998.68117

[25] P. Bartlett, O. Bousquet and S. Mendelson, Localized Rademacher complexities. Ann. Statist. 33 (2005) 1497-1537. | Zbl 1083.62034

[26] P.L. Bartlett and S. Ben-David, Hardness results for neural network approximation problems. Theoret. Comput. Sci. 284 (2002) 53-66. | Zbl 0997.68098

[27] P.L. Bartlett, M.I. Jordan and J.D. Mcauliffe, Convexity, classification, and risk bounds. J. Amer. Statis. Assoc., to appear (2005). | MR 2268032 | Zbl 1118.62330

[28] P.L. Bartlett and W. Maass, Vapnik-Chervonenkis dimension of neural nets, in Handbook Brain Theory Neural Networks, M.A. Arbib Ed. MIT Press, second edition. (2003) 1188-1192.

[29] P.L. Bartlett and S. Mendelson, Rademacher and gaussian complexities: risk bounds and structural results. J. Machine Learning Res. 3 (2002) 463-482. | Zbl 1084.68549

[30] P. L. Bartlett, S. Mendelson and P. Philips, Local Complexities for Empirical Risk Minimization, in Proc. of the 17th Annual Conference on Learning Theory (COLT), Springer (2004). | MR 2177915 | Zbl 1078.68046

[31] O. Bashkirov, E.M. Braverman and I.E. Muchnik, Potential function algorithms for pattern recognition learning machines. Automat. Remote Control 25 (1964) 692-695. | Zbl 0221.68057

[32] S. Ben-David, N. Eiron and H.-U. Simon, Limitations of learning via embeddings in Euclidean half spaces. J. Machine Learning Res. 3 (2002) 441-461. | Zbl 1084.68551

[33] G. Bennett, Probability inequalities for the sum of independent random variables. J. Amer. Statis. Assoc. 57 (1962) 33-45. | Zbl 0104.11905

[34] S.N. Bernstein, The Theory of Probabilities. Gostehizdat Publishing House, Moscow (1946).

[35] L. Birgé, An alternative point of view on Lepski's method, in State of the art in probability and statistics (Leiden, 1999), Inst. Math. Statist., Beachwood, OH, IMS Lecture Notes Monogr. Ser. 36 (2001) 113-133.

[36] L. Birgé and P. Massart, Rates of convergence for minimum contrast estimators. Probab. Theory Related Fields 97 (1993) 113-150. | Zbl 0805.62037

[37] L. Birgé and P. Massart, From model selection to adaptive estimation, in Festschrift for Lucien Le Cam: Research papers in Probability and Statistics, E. Torgersen D. Pollard and G. Yang Eds., Springer, New York (1997) 55-87. | Zbl 0920.62042

[38] L. Birgé and P. Massart, Minimum contrast estimators on sieves: exponential bounds and rates of convergence. Bernoulli 4 (1998) 329-375. | Zbl 0954.62033

[39] G. Blanchard, O. Bousquet and P. Massart, Statistical performance of support vector machines. Ann. Statist., to appear (2006). | MR 2396805 | Zbl 1133.62044

[40] G. Blanchard, G. Lugosi and N. Vayatis, On the rates of convergence of regularized boosting classifiers. J. Machine Learning Res. 4 (2003) 861-894. | Zbl 1083.68109

[41] A. Blumer, A. Ehrenfeucht, D. Haussler and M.K. Warmuth, Learnability and the Vapnik-Chervonenkis dimension. J. ACM 36 (1989) 929-965. | Zbl 0697.68079

[42] S. Bobkov and M. Ledoux, Poincaré's inequalities and Talagrands's concentration phenomenon for the exponential distribution. Probab. Theory Related Fields 107 (1997) 383-400. | Zbl 0878.60014

[43] B. Boser, I. Guyon and V.N. Vapnik, A training algorithm for optimal margin classifiers, in Proc. of the Fifth Annual ACM Workshop on Computational Learning Theory (COLT). Association for Computing Machinery, New York, NY (1992) 144-152.

[44] S. Boucheron, O. Bousquet, G. Lugosi and P. Massart, Moment inequalities for functions of independent random variables. Ann. Probab. 33 (2005) 514-560. | Zbl 1074.60018

[45] S. Boucheron, G. Lugosi and P. Massart, A sharp concentration inequality with applications. Random Structures Algorithms 16 (2000) 277-292. | Zbl 0954.60008

[46] S. Boucheron, G. Lugosi and P. Massart, Concentration inequalities using the entropy method. Ann. Probab. 31 (2003) 1583-1614. | Zbl 1051.60020

[47] O. Bousquet, A Bennett concentration inequality and its application to suprema of empirical processes. C. R. Acad. Sci. Paris 334 (2002) 495-500. | Zbl 1001.60021

[48] O. Bousquet, Concentration inequalities for sub-additive functions using the entropy method, in Stochastic Inequalities and Applications, C. Houdré E. Giné and D. Nualart Eds., Birkhauser (2003). | MR 2073435 | Zbl 1037.60015

[49] O. Bousquet and A. Elisseeff, Stability and generalization. J. Machine Learning Res. 2 (2002) 499-526. | Zbl 1007.68083

[50] O. Bousquet, V. Koltchinskii and D. Panchenko, Some local measures of complexity of convex hulls and generalization bounds, in Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT), Springer (2002) 59-73. | Zbl 1050.68055

[51] L. Breiman, Arcing classifiers. Ann. Statist. 26 (1998) 801-849. | Zbl 0934.62064

[52] L. Breiman, Some infinite theory for predictor ensembles. Ann. Statist. 32 (2004) 1-11. | Zbl 1105.62308

[53] L. Breiman, J.H. Friedman, R.A. Olshen and C.J. Stone, Classification and Regression Trees. Wadsworth International, Belmont, CA (1984). | MR 726392 | Zbl 0541.62042

[54] P. Bühlmann and B. Yu, Boosting with the l 2 -loss: Regression and classification. J. Amer. Statis. Assoc. 98 (2004) 324-339. | Zbl 1041.62029

[55] A. Cannon, J.M. Ettinger, D. Hush and C. Scovel, Machine learning with data dependent hypothesis classes. J. Machine Learning Res. 2 (2002) 335-358. | Zbl 1007.68156

[56] G. Castellan, Density estimation via exponential model selection. IEEE Trans. Inform. Theory 49 (2003) 2052-2060.

[57] O. Catoni, Randomized estimators and empirical complexity for pattern recognition and least square regression. Preprint PMA-677.

[58] O. Catoni, Statistical learning theory and stochastic optimization. École d'été de Probabilités de Saint-Flour XXXI. Springer-Verlag. Lect. Notes Math. 1851 (2004). | Zbl 1076.93002

[59] O. Catoni, Localized empirical complexity bounds and randomized estimators (2003). Preprint.

[60] N. Cesa-Bianchi and D. Haussler, A graph-theoretic generalization of the Sauer-Shelah lemma. Discrete Appl. Math. 86 (1998) 27-35. | Zbl 0918.05066

[61] M. Collins, R.E. Schapire and Y. Singer, Logistic regression, AdaBoost and Bregman distances. Machine Learning 48 (2002) 253-285. | Zbl 0998.68123

[62] C. Cortes and V.N. Vapnik, Support vector networks. Machine Learning 20 (1995) 1-25. | Zbl 0831.68098

[63] T.M. Cover, Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Electronic Comput. 14 (1965) 326-334. | Zbl 0152.18206

[64] P. Craven and G. Wahba, Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31 (1979) 377-403. | Zbl 0377.65007

[65] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, Cambridge, UK (2000). | Zbl 0994.68074

[66] I. Csiszár, Large-scale typicality of Markov sample paths and consistency of MDL order estimators. IEEE Trans. Inform. Theory 48 (2002) 1616-1628. | Zbl 1060.62092

[67] I. Csiszár and P. Shields, The consistency of the BIC Markov order estimator. Ann. Statist. 28 (2000) 1601-1619. | Zbl 1105.62311

[68] F. Cucker and S. Smale, On the mathematical foundations of learning. Bull. Amer. Math. Soc. (2002) 1-50. | Zbl 0983.68162

[69] A. Dembo, Information inequalities and concentration of measure. Ann. Probab. 25 (1997) 927-939. | Zbl 0880.60018

[70] P.A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach. Prentice-Hall, Englewood Cliffs, NJ (1982). | MR 692767 | Zbl 0542.68071

[71] L. Devroye, Automatic pattern recognition: A study of the probability of error. IEEE Trans. Pattern Anal. Machine Intelligence 10 (1988) 530-543. | Zbl 0661.62056

[72] L. Devroye, L. Györfi and G. Lugosi, A Probabilistic Theory of Pattern Recognition. Springer-Verlag, New York (1996). | MR 1383093 | Zbl 0853.68150

[73] L. Devroye and G. Lugosi, Lower bounds in pattern recognition and learning. Pattern Recognition 28 (1995) 1011-1018.

[74] L. Devroye and T. Wagner, Distribution-free inequalities for the deleted and holdout error estimates. IEEE Trans. Inform. Theory 25(2) (1979) 202-207. | Zbl 0408.62055

[75] L. Devroye and T. Wagner, Distribution-free performance bounds for potential function rules. IEEE Trans. Inform. Theory 25(5) (1979) 601-604. | Zbl 0432.62040

[76] D.L. Donoho and I.M. Johnstone, Ideal spatial adaptation by wavelet shrinkage. Biometrika 81(3) (1994) 425-455. | Zbl 0815.62019

[77] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis. John Wiley, New York (1973). | Zbl 0277.68056

[78] R.O. Duda, P.E. Hart and D.G. Stork, Pattern Classification. John Wiley and Sons (2000). | MR 1802993 | Zbl 0968.68140

[79] R.M. Dudley, Central limit theorems for empirical measures. Ann. Probab. 6 (1978) 899-929. | Zbl 0404.60016

[80] R.M. Dudley, Balls in R k do not cut all subsets of k+2 points. Advances Math. 31 (3) (1979) 306-308. | Zbl 0408.05001

[81] R.M. Dudley, Empirical processes, in École de Probabilité de St. Flour 1982. Lect. Notes Math. 1097 (1984). | Zbl 0554.60029

[82] R.M. Dudley, Universal Donsker classes and metric entropy. Ann. Probab. 15 (1987) 1306-1326. | Zbl 0631.60004

[83] R.M. Dudley, Uniform Central Limit Theorems. Cambridge University Press, Cambridge (1999). | MR 1720712 | Zbl 0951.60033

[84] R.M. Dudley, E. Giné and J. Zinn, Uniform and universal Glivenko-Cantelli classes. J. Theoret. Probab. 4 (1991) 485-510. | Zbl 0732.60035

[85] B. Efron, Bootstrap methods: another look at the jackknife. Ann. Statist. 7 (1979) 1-26. | Zbl 0406.62024

[86] B. Efron, The jackknife, the bootstrap, and other resampling plans. SIAM, Philadelphia (1982). | MR 659849 | Zbl 0496.62036

[87] B. Efron and R.J. Tibshirani, An Introduction to the Bootstrap. Chapman and Hall, New York (1994). | MR 1270903 | Zbl 0835.62038

[88] A. Ehrenfeucht, D. Haussler, M. Kearns and L. Valiant, A general lower bound on the number of examples needed for learning. Inform. Comput. 82 (1989) 247-261. | Zbl 0679.68158

[89] T. Evgeniou, M. Pontil and T. Poggio, Regularization networks and support vector machines, in Advances in Large Margin Classifiers, A.J. Smola, P.L. Bartlett B. Schölkopf and D. Schuurmans, Eds., Cambridge, MA, MIT Press. (2000) 171-203.

[90] P. Frankl, On the trace of finite sets. J. Combin. Theory, Ser. A 34 (1983) 41-45. | Zbl 0505.05001

[91] Y. Freund, Boosting a weak learning algorithm by majority. Inform. Comput. 121 (1995) 256-285. | Zbl 0833.68109

[92] Y. Freund, Self bounding learning algorithms, in Proceedings of the 11th Annual Conference on Computational Learning Theory (1998) 127-135.

[93] Y. Freund, Y. Mansour and R.E. Schapire, Generalization bounds for averaged classifiers (how to be a Bayesian without believing). Ann. Statist. (2004). | MR 2089139 | Zbl 1045.62056

[94] Y. Freund and R. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55 (1997) 119-139. | Zbl 0880.68103

[95] J. Friedman, T. Hastie and R. Tibshirani, Additive logistic regression: a statistical view of boosting. Ann. Statist. 28 (2000) 337-374. | Zbl 1106.62323

[96] M. Fromont, Some problems related to model selection: adaptive tests and bootstrap calibration of penalties. Thèse de doctorat, Université Paris-Sud (December 2003).

[97] K. Fukunaga, Introduction to Statistical Pattern Recognition. Academic Press, New York (1972). | MR 1075415 | Zbl 0711.62052

[98] E. Giné, Empirical processes and applications: an overview. Bernoulli 2 (1996) 1-28. | Zbl 0849.62026

[99] E. Giné and J. Zinn, Some limit theorems for empirical processes. Ann. Probab. 12 (1984) 929-989. | Zbl 0553.60037

[100] E. Giné, Lectures on some aspects of the bootstrap, in Lectures on probability theory and statistics (Saint-Flour, 1996). Lect. Notes Math. 1665 (1997) 37-151. | Zbl 0882.62040

[101] P. Goldberg and M. Jerrum, Bounding the Vapnik-Chervonenkis dimension of concept classes parametrized by real numbers. Machine Learning 18 (1995) 131-148. | Zbl 0831.68087

[102] U. Grenander, Abstract inference. John Wiley & Sons Inc., New York (1981). | MR 599175 | Zbl 0505.62069

[103] P. Hall, Large sample optimality of least squares cross-validation in density estimation. Ann. Statist. 11 (1983) 1156-1174. | Zbl 0599.62051

[104] T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning. Springer Series in Statistics. Springer-Verlag, New York (2001). | MR 1851606 | Zbl 0973.62007

[105] D. Haussler, Decision theoretic generalizations of the pac model for neural nets and other learning applications. Inform. Comput. 100 (1992) 78-150. | Zbl 0762.68050

[106] D. Haussler, Sphere packing numbers for subsets of the boolean n-cube with bounded Vapnik-Chervonenkis dimension. J. Combin. Theory, Ser. A 69 (1995) 217-232. | Zbl 0818.60005

[107] D. Haussler, N. Littlestone and M. Warmuth, Predicting {0,1} functions from randomly drawn points, in Proc. of the 29th IEEE Symposium on the Foundations of Computer Science, IEEE Computer Society Press, Los Alamitos, CA (1988) 100-109.

[108] R. Herbrich and R.C. Williamson, Algorithmic luckiness. J. Machine Learning Res. 3 (2003) 175-212. | Zbl 1088.68151

[109] W. Hoeffding, Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 (1963) 13-30. | Zbl 0127.10602

[110] P. Huber, The behavior of the maximum likelihood estimates under non-standard conditions, in Proc. Fifth Berkeley Symposium on Probability and Mathematical Statistics, Univ. California Press (1967) 221-233. | Zbl 0212.21504

[111] W. Jiang, Process consistency for adaboost. Ann. Statist. 32 (2004) 13-29. | Zbl 1105.62316

[112] D.S. Johnson and F.P. Preparata, The densest hemisphere problem. Theoret. Comput. Sci. 6 (1978) 93-107. | Zbl 0368.68053

[113] I. Johnstone, Function estimation and gaussian sequence models. Technical Report. Department of Statistics, Stanford University (2002).

[114] M. Karpinski and A. Macintyre, Polynomial bounds for vc dimension of sigmoidal and general pfaffian neural networks. J. Comput. Syst. Sci. 54 (1997). | MR 1463257 | Zbl 0869.68088

[115] M. Kearns, Y. Mansour, A.Y. Ng and D. Ron, An experimental and theoretical comparison of model selection methods, in Proc. of the Eighth Annual ACM Workshop on Computational Learning Theory, Association for Computing Machinery, New York (1995) 21-30.

[116] M.J. Kearns and D. Ron, Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural Comput. 11(6) (1999) 1427-1453.

[117] M.J. Kearns and U.V. Vazirani, An Introduction to Computational Learning Theory. MIT Press, Cambridge, Massachusetts (1994). | MR 1331838

[118] A.G. Khovanskii, Fewnomials. Translations of Mathematical Monographs 88, American Mathematical Society (1991). | MR 1108621 | Zbl 0728.12002

[119] J.C. Kieffer, Strongly consistent code-based identification and order estimation for constrained finite-state model classes. IEEE Trans. Inform. Theory 39 (1993) 893-902. | Zbl 0784.94005

[120] G.S. Kimeldorf and G. Wahba, A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Ann. Math. Statist. 41 (1970) 495-502. | Zbl 0193.45201

[121] P. Koiran and E.D. Sontag, Neural networks with quadratic vc dimension. J. Comput. Syst. Sci. 54 (1997). | MR 1463259 | Zbl 0869.68089

[122] A.N. Kolmogorov, On the representation of continuous functions of several variables by superposition of continuous functions of one variable and addition. Dokl. Akad. Nauk SSSR 114 (1957) 953-956. | Zbl 0090.27103

[123] A.N. Kolmogorov and V.M. Tikhomirov, ε-entropy and ε-capacity of sets in functional spaces. Amer. Math. Soc. Transl., Ser. 2 17 (1961) 277-364. | Zbl 0133.06703

[124] V. Koltchinskii, Rademacher penalties and structural risk minimization. IEEE Trans. Inform. Theory 47 (2001) 1902-1914. | Zbl 1008.62614

[125] V. Koltchinskii, Local Rademacher complexities and oracle inequalities in risk minimization. Manuscript (September 2003). | Zbl 1118.62065

[126] V. Koltchinskii and D. Panchenko, Rademacher processes and bounding the risk of function learning, in High Dimensional Probability II, E. Giné, D.M. Mason and J.A. Wellner, Eds. (2000) 443-459. | Zbl 1106.68385

[127] V. Koltchinskii and D. Panchenko, Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statist. 30 (2002). | MR 1892654 | Zbl 1012.62004

[128] S. Kulkarni, G. Lugosi and S. Venkatesh, Learning pattern classification - a survey. IEEE Trans. Inform. Theory 44 (1998) 2178-2206. Information Theory: 1948-1998. Commemorative special issue. | Zbl 0935.68093

[129] S. Kutin and P. Niyogi, Almost-everywhere algorithmic stability and generalization error, in UAI-2002: Uncertainty in Artificial Intelligence (2002).

[130] J. Langford and M. Seeger, Bounds for averaging classifiers. CMU-CS 01-102, Carnegie Mellon University (2001).

[131] M. Ledoux, Isoperimetry and gaussian analysis in Lectures on Probability Theory and Statistics, P. Bernard Ed., École d'Été de Probabilités de St-Flour XXIV-1994 (1996) 165-294. | Zbl 0874.60005

[132] M. Ledoux, On Talagrand's deviation inequalities for product measures. ESAIM: PS 1 (1997) 63-87. | Numdam | Zbl 0869.60013

[133] M. Ledoux and M. Talagrand, Probability in Banach Space. Springer-Verlag, New York (1991). | MR 1102015 | Zbl 0748.60004

[134] W.S. Lee, P.L. Bartlett and R.C. Williamson, The importance of convexity in learning with squared loss. IEEE Trans. Inform. Theory 44 (1998) 1974-1980. | Zbl 0935.68091

[135] O.V. Lepskiĭ, E. Mammen and V.G. Spokoiny, Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors. Ann. Statist. 25 (1997) 929-947. | Zbl 0885.62044

[136] O.V. Lepskiĭ, A problem of adaptive estimation in Gaussian white noise. Teor. Veroyatnost. i Primenen. 35 (1990) 459-470. | Zbl 0725.62075

[137] O.V. Lepskiĭ, Asymptotically minimax adaptive estimation. I. Upper bounds. Optimally adaptive estimates. Teor. Veroyatnost. i Primenen. 36 (1991) 645-659. | Zbl 0738.62045

[138] Y. Li, P.M. Long and A. Srinivasan, Improved bounds on the sample complexity of learning. J. Comput. Syst. Sci. 62 (2001) 516-527. | Zbl 0990.68081

[139] Y. Lin, A note on margin-based loss functions in classification. Technical Report 1029r, Department of Statistics, University Wisconsin, Madison (1999). | Zbl 1058.62052

[140] Y. Lin, Some asymptotic properties of the support vector machine. Technical Report 1044r, Department of Statistics, University of Wisconsin, Madison (1999).

[141] Y. Lin, Support vector machines and the bayes rule in classification. Data Mining and Knowledge Discovery 6 (2002) 259-275.

[142] F. Lozano, Model selection using Rademacher penalization, in Proceedings of the Second ICSC Symposia on Neural Computation (NC2000). ICSC Adademic Press (2000).

[143] M.J. Luczak and C. Mcdiarmid, Concentration for locally acting permutations. Discrete Math. 265 (2003) 159-171. | Zbl 1025.60003

[144] G. Lugosi, Pattern classification and learning theory, in Principles of Nonparametric Learning, L. Györfi Ed., Springer, Wien (2002) 5-62.

[145] G. Lugosi and A. Nobel, Adaptive model selection using empirical complexities. Ann. Statist. 27 (1999) 1830-1864. | Zbl 0962.62034

[146] G. Lugosi and N. Vayatis, On the Bayes-risk consistency of regularized boosting methods. Ann. Statist. 32 (2004) 30-55. | Zbl 1105.62319

[147] G. Lugosi and M. Wegkamp, Complexity regularization via localized random penalties. Ann. Statist. 2 (2004) 1679-1697. | Zbl 1045.62060

[148] G. Lugosi and K. Zeger, Concept learning using complexity regularization. IEEE Trans. Inform. Theory 42 (1996) 48-54. | Zbl 0844.62006

[149] A. Macintyre and E.D. Sontag, Finiteness results for sigmoidal “neural” networks, in Proc. of the 25th Annual ACM Symposium on the Theory of Computing, Association of Computing Machinery, New York (1993) 325-334.

[150] C.L. Mallows, Some comments on C p . Technometrics 15 (1997) 661-675. | Zbl 0269.62061

[151] E. Mammen and A. Tsybakov, Smooth discrimination analysis. Ann. Statist. 27(6) (1999) 1808-1829. | Zbl 0961.62058

[152] S. Mannor and R. Meir, Weak learners and improved convergence rate in boosting, in Advances in Neural Information Processing Systems 13: Proc. NIPS'2000 (2001).

[153] S. Mannor, R. Meir and T. Zhang, The consistency of greedy algorithms for classification, in Proceedings of the 15th Annual Conference on Computational Learning Theory (2002). | MR 2040422 | Zbl 1050.68581

[154] K. Marton, A simple proof of the blowing-up lemma. IEEE Trans. Inform. Theory 32 (1986) 445-446. | Zbl 0594.94003

[155] K. Marton, Bounding d ¯-distance by informational divergence: a way to prove measure concentration. Ann. Probab. 24 (1996) 857-866. | Zbl 0865.60017

[156] K. Marton, A measure concentration inequality for contracting Markov chains. Geometric Functional Analysis 6 (1996) 556-571. Erratum: 7 (1997) 609-613. | Zbl 0895.60073

[157] L. Mason, J. Baxter, P.L. Bartlett and M. Frean, Functional gradient techniques for combining hypotheses, in Advances in Large Margin Classifiers, A.J. Smola, P.L. Bartlett, B. Schölkopf and D. Schuurmans Eds., MIT Press, Cambridge, MA (1999) 221-247.

[158] P. Massart, Optimal constants for Hoeffding type inequalities. Technical report, Mathematiques, Université de Paris-Sud, Report 98.86, 1998.

[159] P. Massart, About the constants in Talagrand's concentration inequalities for empirical processes. Ann. Probab. 28 (2000) 863-884. | Zbl 1140.60310

[160] P. Massart, Some applications of concentration inequalities to statistics. Ann. Fac. Sci. Toulouse IX (2000) 245-303. | Numdam | Zbl 0986.62002

[161] P. Massart, École d'Eté de Probabilité de Saint-Flour XXXIII, chapter Concentration inequalities and model selection, LNM. Springer-Verlag (2003). | Zbl pre05150953

[162] P. Massart and E. Nédélec, Risk bounds for statistical learning, Ann. Statist., to appear. | MR 2291502 | Zbl 1108.62007

[163] D.A. Mcallester, Some pac-Bayesian theorems, in Proc. of the 11th Annual Conference on Computational Learning Theory, ACM Press (1998) 230-234.

[164] D.A. Mcallester, pac-Bayesian model averaging, in Proc. of the 12th Annual Conference on Computational Learning Theory. ACM Press (1999). | MR 1811612

[165] D.A. Mcallester, pac-Bayesian stochastic model selection. Machine Learning 51 (2003) 5-21. | Zbl 1056.68122

[166] C. Mcdiarmid, On the method of bounded differences, in Surveys in Combinatorics 1989, Cambridge University Press, Cambridge (1989) 148-188. | Zbl 0712.05012

[167] C. Mcdiarmid, Concentration, in Probabilistic Methods for Algorithmic Discrete Mathematics, M. Habib, C. McDiarmid, J. Ramirez-Alfonsin and B. Reed Eds., Springer, New York (1998) 195-248. | Zbl 0927.60027

[168] C. Mcdiarmid, Concentration for independent permutations. Combin. Probab. Comput. 2 (2002) 163-178. | Zbl 1001.60014

[169] G.J. Mclachlan, Discriminant Analysis and Statistical Pattern Recognition. John Wiley, New York (1992). | MR 1190469 | Zbl 1108.62317

[170] S. Mendelson, Improving the sample complexity using global data. IEEE Trans. Inform. Theory 48 (2002) 1977-1991. | Zbl 1061.68128

[171] S. Mendelson, A few notes on statistical learning theory, in Advanced Lectures in Machine Learning. Lect. Notes Comput. Sci. 2600, S. Mendelson and A. Smola Eds., Springer (2003) 1-40. | Zbl 1019.68093

[172] S. Mendelson and P. Philips, On the importance of “small” coordinate projections. J. Machine Learning Res. 5 (2004) 219-238.

[173] S. Mendelson and R. Vershynin, Entropy and the combinatorial dimension. Inventiones Mathematicae 152 (2003) 37-55. | Zbl 1039.60016

[174] V. Milman and G. Schechman, Asymptotic theory of finite-dimensional normed spaces, Springer-Verlag, New York (1986). | MR 856576

[175] B.K. Natarajan, Machine Learning: A Theoretical Approach, Morgan Kaufmann, San Mateo, CA (1991). | MR 1137519

[176] D. Panchenko, A note on Talagrand's concentration inequality. Electron. Comm. Probab. 6 (2001). | Zbl 0977.60008

[177] D. Panchenko, Some extensions of an inequality of Vapnik and Chervonenkis. Electron. Comm. Probab. 7 (2002). | MR 1887174 | Zbl 1008.60035

[178] D. Panchenko, Symmetrization approach to concentration inequalities for empirical processes. Ann. Probab. 31 (2003) 2068-2081. | Zbl 1042.60008

[179] T. Poggio, S. Rifkin, S. Mukherjee and P. Niyogi, General conditions for predictivity in learning theory. Nature 428 (2004) 419-422.

[180] D. Pollard, Convergence of Stochastic Processes, Springer-Verlag, New York (1984). | MR 762984 | Zbl 0544.60045

[181] D. Pollard, Uniform ratio limit theorems for empirical processes. Scand. J. Statist. 22 (1995) 271-278. | Zbl 0835.62051

[182] W. Polonik, Measuring mass concentrations and estimating density contour clusters-an excess mass approach. Ann. Statist. 23(3) (1995) 855-881. | Zbl 0841.62045

[183] E. Rio, Inégalités de concentration pour les processus empiriques de classes de parties. Probab. Theory Related Fields 119 (2001) 163-175. | Zbl 0976.60033

[184] E. Rio, Une inegalité de Bennett pour les maxima de processus empiriques, in Colloque en l'honneur de J. Bretagnolle, D. Dacunha-Castelle et I. Ibragimov, Annales de l'Institut Henri Poincaré (2001). | Numdam | Zbl 1014.60011

[185] B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge University Press (1996). | MR 1438788 | Zbl 0853.62046

[186] W.H. Rogers and T.J. Wagner, A finite sample distribution-free performance bound for local discrimination rules. Ann. Statist. 6 (1978) 506-514. | Zbl 0385.62041

[187] M. Rudelson, R. Vershynin, Combinatorics of random processes and sections of convex bodies. Ann. Math, to appear (2004). | MR 2247969 | Zbl 1114.60009

[188] N. Sauer, On the density of families of sets. J. Combin. Theory, Ser A 13 (1972) 145-147. | Zbl 0248.05005

[189] R.E. Schapire, The strength of weak learnability. Machine Learning 5 (1990) 197-227. | Zbl 0747.68058

[190] R.E. Schapire, Y. Freund, P. Bartlett and W.S. Lee, Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Statist. 26 (1998) 1651-1686. | Zbl 0929.62069

[191] B. Schölkopf and A. J. Smola, Learning with Kernels. MIT Press, Cambridge, MA (2002).

[192] D. Schuurmans, Characterizing rational versus exponential learning curves, in Computational Learning Theory: Second European Conference. EuroCOLT'95, Springer-Verlag (1995) 272-286.

[193] C. Scovel and I. Steinwart, Fast rates for support vector machines. Los Alamos National Laboratory Technical Report LA-UR 03-9117 (2003).

[194] M. Seeger, PAC-Bayesian generalisation error bounds for gaussian process classification. J. Machine Learning Res. 3 (2002) 233-269. | Zbl 1088.68745

[195] J. Shawe-Taylor, P.L. Bartlett, R.C. Williamson and M. Anthony, Structural risk minimization over data-dependent hierarchies. IEEE Trans. Inform. Theory 44 (1998) 1926-1940. | Zbl 0935.68090

[196] S. Shelah, A combinatorial problem: Stability and order for models and theories in infinity languages. Pacific J. Mathematics 41 (1972) 247-261. | Zbl 0239.02024

[197] G.R. Shorack and J. Wellner, Empirical Processes with Applications in Statistics. Wiley, New York (1986). | MR 838963

[198] H.U. Simon, General lower bounds on the number of examples needed for learning probabilistic concepts, in Proc. of the Sixth Annual ACM Conference on Computational Learning Theory, Association for Computing Machinery, New York (1993) 402-412.

[199] A.J. Smola, P.L. Bartlett, B. Schölkopf and D. Schuurmans Eds, Advances in Large Margin Classifiers. MIT Press, Cambridge, MA (2000). | MR 1820960 | Zbl 0988.68145

[200] A.J. Smola, B. Schölkopf and K.-R. Müller, The connection between regularization operators and support vector kernels. Neural Networks 11 (1998) 637-649.

[201] D.F. Specht, Probabilistic neural networks and the polynomial Adaline as complementary techniques for classification. IEEE Trans. Neural Networks 1 (1990) 111-121.

[202] J.M. Steele, Existence of submatrices with all possible columns. J. Combin. Theory, Ser. A 28 (1978) 84-88. | Zbl 0373.05004

[203] I. Steinwart, On the influence of the kernel on the consistency of support vector machines. J. Machine Learning Res. (2001) 67-93. | Zbl 1009.68143

[204] I. Steinwart, Consistency of support vector machines and other regularized kernel machines. IEEE Trans. Inform. Theory 51 (2005) 128-142.

[205] I. Steinwart, Support vector machines are universally consistent. J. Complexity 18 (2002) 768-791. | Zbl 1030.68074

[206] I. Steinwart, On the optimal parameter choice in ν-support vector machines. IEEE Trans. Pattern Anal. Machine Intelligence 25 (2003) 1274-1284.

[207] I. Steinwart, Sparseness of support vector machines. J. Machine Learning Res. 4 (2003) 1071-1105. | Zbl 1094.68082

[208] S.J. Szarek and M. Talagrand, On the convexified Sauer-Shelah theorem. J. Combin. Theory, Ser. B 69 (1997) 183-192. | Zbl 0868.05051

[209] M. Talagrand, The Glivenko-Cantelli problem. Ann. Probab. 15 (1987) 837-870. | Zbl 0632.60024

[210] M. Talagrand, Sharper bounds for Gaussian and empirical processes. Ann. Probab. 22 (1994) 28-76. | Zbl 0798.60051

[211] M. Talagrand, Concentration of measure and isoperimetric inequalities in product spaces. Publications Mathématiques de l'I.H.E.S. 81 (1995) 73-205. | Numdam | Zbl 0864.60013

[212] M. Talagrand, The Glivenko-Cantelli problem, ten years later. J. Theoret. Probab. 9 (1996) 371-384. | Zbl 0880.60023

[213] M. Talagrand, Majorizing measures: the generic chaining. Ann. Probab. 24 (1996) 1049-1103. (Special Invited Paper). | Zbl 0867.60017

[214] M. Talagrand, New concentration inequalities in product spaces. Inventiones Mathematicae 126 (1996) 505-563. | Zbl 0893.60001

[215] M. Talagrand, A new look at independence. Ann. Probab. 24 (1996) 1-34. (Special Invited Paper). | Zbl 0858.60019

[216] M. Talagrand, Vapnik-Chervonenkis type conditions and uniform Donsker classes of functions. Ann. Probab. 31 (2003) 1565-1582. | Zbl 1044.60027

[217] M. Talagrand, The generic chaining: upper and lower bounds for stochastic processes. Springer-Verlag, New York (2005). | MR 2133757 | Zbl 1075.60001

[218] A. Tsybakov. On nonparametric estimation of density level sets. Ann. Stat. 25 (1997) 948-969. | Zbl 0881.62039

[219] A.B. Tsybakov, Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 (2004) 135-166. | Zbl 1105.62353

[220] A.B. Tsybakov, Introduction à l'estimation non-paramétrique. Springer (2004). | Zbl 1029.62034

[221] A. Tsybakov and S. Van De Geer, Square root penalty: adaptation to the margin in classification and in edge estimation. Ann. Statist., to appear (2005). | MR 2195633 | Zbl 1080.62047

[222] S. Van De Geer, A new approach to least-squares estimation, with applications. Ann. Statist. 15 (1987) 587-602. | Zbl 0625.62046

[223] S. Van De Geer, Estimating a regression function. Ann. Statist. 18 (1990) 907-924. | Zbl 0709.62040

[224] S. Van De Geer, Empirical Processes in M-Estimation. Cambridge University Press, Cambridge, UK (2000). | MR 1739079

[225] A.W. Van Der Waart and J.A. Wellner, Weak convergence and empirical processes. Springer-Verlag, New York (1996). | MR 1385671 | Zbl 0862.60002

[226] V. Vapnik and A. Lerner, Pattern recognition using generalized portrait method. Automat. Remote Control 24 (1963) 774-780.

[227] V.N. Vapnik, Estimation of Dependencies Based on Empirical Data. Springer-Verlag, New York (1982). | MR 672244 | Zbl 0499.62005

[228] V.N. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, New York (1995). | MR 1367965 | Zbl 0833.62008

[229] V.N. Vapnik, Statistical Learning Theory. John Wiley, New York (1998). | MR 1641250 | Zbl 0935.62007

[230] V.N. Vapnik and A.Ya. Chervonenkis, On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16 (1971) 264-280. | Zbl 0247.60005

[231] V.N. Vapnik and A.Ya. Chervonenkis, Theory of Pattern Recognition. Nauka, Moscow (1974). (in Russian); German translation: Theorie der Zeichenerkennung, Akademie Verlag, Berlin (1979). | MR 474638

[232] V.N. Vapnik and A.Ya. Chervonenkis, Necessary and sufficient conditions for the uniform convergence of means to their expectations. Theory Probab. Appl. 26 (1981) 821-832. | Zbl 0487.60036

[233] M. Vidyasagar, A Theory of Learning and Generalization. Springer, New York (1997). | MR 1482231 | Zbl 0928.68061

[234] V. Vu, On the infeasibility of training neural networks with small mean squared error. IEEE Trans. Inform. Theory 44 (1998) 2892-2900. | Zbl 0981.68138

[235] M. Wegkamp, Model selection in nonparametric regression. Ann. Statist. 31(1) (2003) 252-273. | Zbl 1019.62037

[236] R.S. Wenocur and R.M. Dudley, Some special Vapnik-Chervonenkis classes. Discrete Math. 33 (1981) 313-318. | Zbl 0459.60008

[237] Y. Yang, Minimax nonparametric classification. I. Rates of convergence. IEEE Trans. Inform. Theory 45(7) (1999) 2271-2284. | Zbl 0962.62026

[238] Y. Yang, Minimax nonparametric classification. II. Model selection for adaptation. IEEE Trans. Inform. Theory 45(7) (1999) 2285-2292. | Zbl 0962.62027

[239] Y. Yang, Adaptive estimation in pattern recognition by combining different procedures. Statistica Sinica 10 (2000) 1069-1089. | Zbl 0979.62019

[240] V.V. Yurinksii, Exponential bounds for large deviations. Theory Probab. Appl. 19 (1974) 154-155. | Zbl 0323.60029

[241] V.V. Yurinksii, Exponential inequalities for sums of random vectors. J. Multivariate Anal. 6 (1976) 473-499. | Zbl 0346.60001

[242] T. Zhang, Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Statist. 32 (2004) 56-85. | Zbl 1105.62323

[243] D.-X. Zhou, Capacity of reproducing kernel spaces in learning theory. IEEE Trans. Inform. Theory 49 (2003) 1743-1752.