Agrégation d'estimateurs et optimisation stochastique
Journal de la Société française de statistique & Revue de statistique appliquée, Tome 149 (2008) no. 1, pp. 3-26.

Cet article fait suite à la Conférence Lucien Le Cam que j’ai eu l’honneur de donner lors des XXXVIIèmes Journées de Statistique à Pau, en 2005. Il présente un aperçu de quelques résultats récents sur les méthodes d’agrégation d’estimateurs. Ces méthodes consistent à construire, à partir d’un ensemble de M estimateurs donnés, une combinaison linéaire ou convexe de ces estimateurs avec des poids aléatoires choisis de façon optimale. Nous mettons l’accent sur le lien entre agrégation et optimisation stochastique, ce qui nous permet d’aboutir à de nouvelles procédures recursives d’agrégation très performantes.

This paper is a written version of the Conférence Lucien Le Cam delivered at the XXXVIIèmes Journées de Statistique in Pau, 2005. It presents an overview of some recent results on the methods of aggregation of estimators. Given a collection of M estimators, aggregation procedures consist in constructing their convex or linear combination with optimally chosen random weights. We mainly focus on the link between aggregation and stochastic optimization which leads us to the construction of some new highly efficient recursive aggregation procedures.

Mot clés : agrégation, optimisation stochastique, descente miroir, estimation adaptative, vitesse optimale d'agrégation
Mots-clés : aggregation, stochastic optimization, mirror descent, adaptive estimation, optimal rate of aggregation
@article{JSFS_2008__149_1_3_0,
     author = {Tsybakov, Alexandre B.},
     title = {Agr\'egation d'estimateurs et optimisation stochastique},
     journal = {Journal de la Soci\'et\'e fran\c{c}aise de statistique & Revue de statistique appliqu\'ee},
     pages = {3--26},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {149},
     number = {1},
     year = {2008},
     language = {fr},
     url = {http://archive.numdam.org/item/JSFS_2008__149_1_3_0/}
}
TY  - JOUR
AU  - Tsybakov, Alexandre B.
TI  - Agrégation d'estimateurs et optimisation stochastique
JO  - Journal de la Société française de statistique & Revue de statistique appliquée
PY  - 2008
SP  - 3
EP  - 26
VL  - 149
IS  - 1
PB  - Société française de statistique
UR  - http://archive.numdam.org/item/JSFS_2008__149_1_3_0/
LA  - fr
ID  - JSFS_2008__149_1_3_0
ER  - 
%0 Journal Article
%A Tsybakov, Alexandre B.
%T Agrégation d'estimateurs et optimisation stochastique
%J Journal de la Société française de statistique & Revue de statistique appliquée
%D 2008
%P 3-26
%V 149
%N 1
%I Société française de statistique
%U http://archive.numdam.org/item/JSFS_2008__149_1_3_0/
%G fr
%F JSFS_2008__149_1_3_0
Tsybakov, Alexandre B. Agrégation d'estimateurs et optimisation stochastique. Journal de la Société française de statistique & Revue de statistique appliquée, Tome 149 (2008) no. 1, pp. 3-26. http://archive.numdam.org/item/JSFS_2008__149_1_3_0/

[1] Audibert J.-Y. (2004). Aggregated estimators and empirical complexity for least square regression. Annales de l'Institut Henri Poincaré (B) Probabilités et Statistiques 40 : 685-736. | Numdam | MR | Zbl

[2] Barron A., Cohen A., Dahmen W. et Devore R. (2005). Approximation and learning by greedy algorithms. Annals of Statistics, à paraître. | Zbl

[3] Ben-Tal A. et Nemirovski A.S. (1999). The conjugate barrier mirror descent method for non-smooth convex optimization. MINERVA Optim. Center Report., Haifa : Faculty of Industrial Engineering and Management, Technion - Israel Institute of Technology. http://iew3.technion.ac.il/Labs/Opt/opt/Pap/CP_MD.pdf

[4] Birgé L. (2006). Model selection via testing : an alternative to (penalized) maximum likelihood estimators. Annales de l'Institut Henri Poincaré (B) Probabilités et Statistiques 42 : 273 - 325. | Numdam | MR

[5] Bickel P.J., Ritov Y. et Zakai A. (2006). Some theory for generalized boosting algorithms. J. Machine Learning Research 7 : 705-732. | MR

[6] Bühlman P. et Yu B. (2005). Sparse boosting. J. Machine Learning Research 7 : 1001-1024. | MR

[7] Bunea F., Tsybakov A.B. et Wegkamp M.H. (2004). Aggregation for regression learning. Preprint LPMA, Universities Paris 6 - Paris 7, n. 948, arXiv :math.ST/0410214 et https://hal.ccsd.cnrs.fr/ccsd-00003205

[8] Bunea F., Tsybakov A.B. et Wegkamp M.H. (2006). Aggregation and sparsity via 1 penalized least squares. Proceedings of 19th Annual Conference on Learning Theory (COLT 2006), Lecture Notes in Artificial Intelligence v. 4005 (Lugosi, G. and Simon, H.U., eds.), Springer-Verlag, Berlin-Heidelberg, 379-391. | MR | Zbl

[9] Bunea F., Tsybakov A.B. et Wegkamp M.H. (2007a). Aggregation for Gaussian regression. Annals of Statistics 35 : 1674-1697. | MR

[10] Bunea F., Tsybakov A.B. et Wegkamp M.H. (2007b). Sparsity oracle inequalities for the Lasso. Electronic Journal of Statistics 1 : 169-194. | MR | Zbl

[11] Bunea F., Tsybakov A.B. et Wegkamp M.H. (2007c). Sparse density estimation with 1 penalties. Proceedings of 20th Annual Conference on Learning Theory (COLT 2007), Lecture Notes in Artificial Intelligence, v. 4539 (N.H. Bshouty and C. Gentile, eds.), Springer-Verlag, Berlin-Heidelberg, 530-543. | MR

[12] Catoni O. (1999). “Universal” aggregation rules with exact bias bounds. Preprint LPMA, Universités Paris 6 - Paris 7, n.510.

[13] Catoni O. (2004). Statistical Learning Theory and Stochastic Optimization. Ecole d'Eté de Probabilités de Saint-Flour XXXI - 2001. Lecture Notes in Mathematics, vol. 1851, Springer, New York. | MR | Zbl

[14] Cesa-Bianchi N. et Lugosi G. (2006). Prediction, Learning, and Games. Cambridge Univ. Press. | MR | Zbl

[15] Dalalyan A., Juditsky A. et Spokoiny V. (2007). A new algorithm for estimating the effective dimension-reduction subspace. arXiv :math/0701887v1.

[16] Dalalyan A. et Tsybakov A.B. (2007). Aggregation by exponential weighting and sharp oracle inequalities. Proceedings of the 20th Annual Conference on Learning Theory (COLT-2007), Lecture Notes in Artificial Intelligence, v. 4539 (N.H. Bshouty and C. Gentile, eds.), Springer-Verlag, Berlin-Heidelberg, 97-111. | MR

[17] Freund Y. (1995). Boosting a weak learning algorithm by majority. Information and Computation 121 : 256-285. | MR | Zbl

[18] Hristache M., Juditsky A. et Spokoiny V. (2001). Direct estimation of the index coefficient in a single-index model. Annals of Statistics 29 : 593-623. | MR | Zbl

[19] Huber P.J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. Proc. Fifth Berkeley Symp. Math. Statist. Prob. 1 : 221-234. | MR | Zbl

[20] Juditsky A.B., Nazin A.V., Tsybakov A.B. et Vayatis N. (2005). Recursive aggregation of estimators by the mirror descent algorithm with averaging. Problems of Information Transmission 41 : 368-384. | MR | Zbl

[21] Juditsky A. et Nemirovski A. (2000). Functional aggregation for nonparametric estimation. Annals of Statistics 28 : 681-712. | MR | Zbl

[22] Juditsky A., Rigollet Ph. et Tsybakov A.B. (2005). Learning by mirror averaging. Annals of Statistics, à paraître.

[23] Klemelä J. (2006). Density estimation with stagewise optimization of the empirical risk. Machine Learning 67 : 169-195.

[24] Koltchinskii V. (2006a). Local Rademacher complexities and oracle inequalities in empirical risk minimization (with discusssion). Annals of Statistics 34 : 2697-2706. | MR | Zbl

[25] Koltchinskii V. (2006b). Sparsity in penalized empirical risk minimization. Annales de l'Institut Henri Poincaré (B) Probabilités et Statistiques, à paraître.

[26] Le Cam L. (1953). On some asymptotic properties of maximum likelihood estimates and related Bayes estimates. University of California Publications in Statistics 1 : 277-329. | MR | Zbl

[27] Lecué G. (2005). Lower bounds and aggregation in density estimation. J. Machine Learning Research 7 : 971-981. | MR

[28] Lecué G. (2007a). Méthodes d'agrégation : optimalité et vitesses rapides. Thèse de doctorat, Université Paris 6. http://tel.archives-ouvertes.fr/tel-00150402

[29] Lecué G. (2007b). Suboptimality of penalized empirical risk minimization in classification. Proceedings of 20th Annual Conference on Learning Theory (COLT 2007), Lecture Notes in Artificial Intelligence, v. 4539 (N.H. Bshouty and C.Gentile, eds.), Springer-Verlag, Berlin-Heidelberg, 142-156. | MR

[30] Lugosi G. (2006). Prédiction randomisée de suites individuelles. J. Société Française de Statistique 147 : 5-37.

[31] Lounici K. (2007). Generalized mirror averaging and D-convex aggregation. Mathematical Methods of Statistics 16 : 246-259. | MR

[32] Lugosi G. et Vayatis N. (2004). On the Bayes-risk consistency of regularized boosting methods (with discussion). Annals of Statistics 32 : 30-55. | MR | Zbl

[33] Mannor S., Meir R. et Zhang T. (2003). Greedy algorithms for classification - consistency, convergence rates, and adaptivity. Journal of Machine Learning Research 4 : 713-742. | MR | Zbl

[34] Nemirovski A. (2000). Topics in Non-parametric Statistics. Ecole d'Eté de Probabilités de Saint-Flour XXVIII - 1998, Lecture Notes in Mathematics, v. 1738, Springer : New York. | MR | Zbl

[35] Nemirovski A.S. et Yudin D.B. (1983). Problem Complexity and Method Efficiency in Optimization, Wiley, Chichester. | MR | Zbl

[36] Nesterov Yu. (2005). Primal-dual subgradient methods for convex problems. CORE discussion paper 2005/67. Center for Operations Research and Econometrics, Louvain-la-Neuve, Belgique.

[37] Nesterov Yu. (2007). Primal-dual subgradient methods for convex problems. Mathematical Programming, publié en ligne, DOI : 10.1007/s10107-007-0149-x.

[38] Rigollet Ph. (2006). Inégalités d'oracle, agrégation et adaptation. Thèse de doctorat, Université Paris 6. http://tel.archives-ouvertes.fr/tel-00115494

[39] Rigollet Ph. et Tsybakov A. (2007). Linear and convex aggregation of density estimators. Mathematical Methods of Statistics 16 : 260-280. | MR

[40] Robbins H. et Monro S. (1951). A stochastic approximation method. Annals of Mathematical Statistics 22 : 400-407. | MR | Zbl

[41] Rockafellar R.T. et Wets R.J.B. (1998). Variational Analysis. Springer, N.Y. | MR | Zbl

[42] Samarov A. et Tsybakov A. (2007) Aggregation of density estimators and dimension reduction. Advances in Statistical Modeling and Inference. Essays in Honor of Kjell A. Doksum (V. Nair, ed.), World Scientific, Singapore e.a., 233-251. | MR

[43] Schapire R.E. (1990). The strength of weak learnability. Machine Learning 5 : 197-227. | Zbl

[44] Tsybakov A. (2003). Optimal rates of aggregation. Computational Learning Theory and Kernel Machines, (B. Schölkopf and M. Warmuth, eds.), Lecture Notes in Artificial Intelligence, v. 2777. Springer, Heidelberg, 303-313.

[45] Vajda I. (1986). Theory of Statistical Inference and Information. Kluwer, Dordrecht. | Zbl

[46] Van De Geer S.A. (2006). High dimensional generalized linear models and the Lasso. Annals of Statistics, à paraître. | Zbl

[47] Vapnik V. et Chervonenkis A. (1974). Theory of Pattern Recognition Nauka, Moscow (in Russian). Traduction allemande : Wapnik W. und Tscherwonenkis A. Theorie der Zeichenerkennung, Berlin : Akademie-Verlag, 1979. | MR

[48] Wand M.P. et Jones M.C. (1995). Kernel Smoothing. Chapman and Hall, London. | MR | Zbl

[49] Wegkamp M.H. (2003). Model selection in nonparametric regression. Annals of Statistics 31 : 252-273. | MR | Zbl

[50] Yang Y. (2000a). Mixing strategies for density estimation. Annals of Statistics 28 : 75-87. | MR | Zbl

[51] Yang Y. (2000b). Combining different procedures for adaptive regression. Journal of Multivariate Analysis 74 : 135-161. | MR | Zbl

[52] Yang Y. (2001). Adaptive regression by mixing. Journal of the American Statistical Association 96 : 574-588. | MR | Zbl

[53] Yang Y. (2004). Aggregating regression procedures for a better performance. Bernoulli 10 : 25-47. | MR | Zbl

[54] Zhang T. et Yu B. (2005) Boosting with early stopping : convergence and cosistency. Annals of Statistics 33 : 1538-1579. | MR | Zbl