Model selection via testing : an alternative to (penalized) maximum likelihood estimators
Annales de l'I.H.P. Probabilités et statistiques, Volume 42 (2006) no. 3, pp. 273-325.
@article{AIHPB_2006__42_3_273_0,
     author = {Birg\'e, Lucien},
     title = {Model selection via testing : an alternative to (penalized) maximum likelihood estimators},
     journal = {Annales de l'I.H.P. Probabilit\'es et statistiques},
     pages = {273--325},
     publisher = {Elsevier},
     volume = {42},
     number = {3},
     year = {2006},
     doi = {10.1016/j.anihpb.2005.04.004},
     mrnumber = {2219712},
     zbl = {05024238},
     language = {en},
     url = {http://archive.numdam.org/articles/10.1016/j.anihpb.2005.04.004/}
}
TY  - JOUR
AU  - Birgé, Lucien
TI  - Model selection via testing : an alternative to (penalized) maximum likelihood estimators
JO  - Annales de l'I.H.P. Probabilités et statistiques
PY  - 2006
SP  - 273
EP  - 325
VL  - 42
IS  - 3
PB  - Elsevier
UR  - http://archive.numdam.org/articles/10.1016/j.anihpb.2005.04.004/
DO  - 10.1016/j.anihpb.2005.04.004
LA  - en
ID  - AIHPB_2006__42_3_273_0
ER  - 
%0 Journal Article
%A Birgé, Lucien
%T Model selection via testing : an alternative to (penalized) maximum likelihood estimators
%J Annales de l'I.H.P. Probabilités et statistiques
%D 2006
%P 273-325
%V 42
%N 3
%I Elsevier
%U http://archive.numdam.org/articles/10.1016/j.anihpb.2005.04.004/
%R 10.1016/j.anihpb.2005.04.004
%G en
%F AIHPB_2006__42_3_273_0
Birgé, Lucien. Model selection via testing : an alternative to (penalized) maximum likelihood estimators. Annales de l'I.H.P. Probabilités et statistiques, Volume 42 (2006) no. 3, pp. 273-325. doi : 10.1016/j.anihpb.2005.04.004. http://archive.numdam.org/articles/10.1016/j.anihpb.2005.04.004/

[1] P. Assouad, Deux remarques sur l'estimation, C. R. Acad. Sci. Paris, Sér. I Math. 296 (1983) 1021-1024. | MR | Zbl

[2] J.-Y. Audibert, Théorie statistique de l'apprentissage : une approche PAC-bayésienne, Thèse de doctorat, Laboratoire de Probabilités et Modèles Aléatoires, Université Paris VI, Paris, 2004.

[3] Y. Baraud, Model selection for regression on a random design, ESAIM Probab. Statist. 6 (2002) 127-146. | Numdam | MR | Zbl

[4] A.R. Barron, Complexity regularization with applications to artificial neural networks, in: Roussas G. (Ed.), Nonparametric Functional Estimation, Kluwer, Dordrecht, 1991, pp. 561-576. | MR | Zbl

[5] A.R. Barron, L. Birgé, P. Massart, Risk bounds for model selection via penalization, Probab. Theory Related Fields 113 (1999) 301-415. | MR | Zbl

[6] A.R. Barron, T.M. Cover, Minimum complexity density estimation, IEEE Trans. Inform. Theory 37 (1991) 1034-1054. | MR | Zbl

[7] J. Beirlant, L. Györfi, On the asymptotic normality of the L 2 -error in partitioning regression estimation, J. Statist. Plann. Inference 71 (1998) 93-107. | MR | Zbl

[8] L. Birgé, Approximation dans les espaces métriques et théorie de l'estimation, Z. Wahrscheinlichkeitstheorie Verw. Gebiete 65 (1983) 181-237. | MR | Zbl

[9] L. Birgé, Sur un théorème de minimax et son application aux tests, Probab. Math. Statist. 3 (1984) 259-282. | MR | Zbl

[10] L. Birgé, Stabilité et instabilité du risque minimax pour des variables indépendantes équidistribuées, Ann. Inst. H. Poincaré Sect. B 20 (1984) 201-223. | Numdam | MR | Zbl

[11] L. Birgé, On estimating a density using Hellinger distance and some other strange facts, Probab. Theory Related Fields 71 (1986) 271-291. | MR | Zbl

[12] L. Birgé, Model selection for Gaussian regression with random design, Bernoulli 10 (2004) 1039-1051. | MR | Zbl

[13] L. Birgé, P. Massart, Rates of convergence for minimum contrast estimators, Probab. Theory Related Fields 97 (1993) 113-150. | MR | Zbl

[14] L. Birgé, P. Massart, From model selection to adaptive estimation, in: Pollard D., Torgersen E., Yang G. (Eds.), Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics, Springer-Verlag, New York, 1997, pp. 55-87. | MR | Zbl

[15] L. Birgé, P. Massart, Minimum contrast estimators on sieves: exponential bounds and rates of convergence, Bernoulli 4 (1998) 329-375. | MR | Zbl

[16] L. Birgé, P. Massart, An adaptive compression algorithm in Besov spaces, Constr. Approx. 16 (2000) 1-36. | MR | Zbl

[17] L. Birgé, P. Massart, Gaussian model selection, J. Eur. Math. Soc. 3 (2001) 203-268. | MR | Zbl

[18] M.S. Birman, M.Z. Solomjak, Piecewise-polynomial approximation of functions of the classes W p , Mat. Sb. 73 (1967) 295-317. | MR | Zbl

[19] L.D. Brown, M.G. Low, Asymptotic equivalence of nonparametric regression and white noise, Ann. Statist. 24 (1996) 2384-2398. | MR | Zbl

[20] F. Bunea, A.B. Tsybakov, M.H. Wegkamp, Aggregation for regression learning, Technical report 948, Laboratoire de Probabilités, Université Paris VI, 2004, http://www.proba.jussieu.fr/mathdoc/preprints/index.html# 2004.

[21] G. Castellan, Modified Akaike's criterion for histogram density estimation, Technical report 99.61, Université Paris-Sud, Orsay, 1999, http://www.math.u-psud.fr/~biblio/pub/1999/.

[22] G. Castellan, Sélection d'histogrammes à l'aide d'un critère de type Akaike, C. R. Acad. Sci. Paris 330 (2000) 729-732. | MR | Zbl

[23] O. Catoni, The mixture approach to universal model selection, Technical report LMENS-97-22, Ecole Normale Supérieure, Paris, 1997, http://www.dma.ens.fr/edition/publis/1997/titre97.html. | Zbl

[24] O. Catoni, Statistical learning theory and stochastic optimization, in: Picard J. (Ed.), Lecture on Probability Theory and Statistics, Ecole d'Eté de Probabilités de Saint-Flour XXXI - 2001, Lecture Note in Math., vol. 1851, Springer-Verlag, Berlin, 2004. | MR | Zbl

[25] H. Chernoff, A measure of asymptotic efficiency of tests of a hypothesis based on a sum of observations, Ann. Math. Statist. 23 (1952) 493-507. | MR | Zbl

[26] R.A. DeVore, G. Kerkyacharian, D. Picard, V. Temlyakov, Mathematical methods for supervised learning, Technical report 0422, IMI, University of South Carolina, Columbia, 2004, http://www.math.sc.edu/imip/preprints/04.html.

[27] R.A. Devore, G.G. Lorentz, Constructive Approximation, Springer-Verlag, Berlin, 1993. | MR | Zbl

[28] L. Devroye, G. Lugosi, Combinatorial Methods in Density Estimation, Springer-Verlag, New York, 2001. | MR | Zbl

[29] D.L. Donoho, I.M. Johnstone, G. Kerkyacharian, D. Picard, Density estimation by wavelet thresholding, Ann. Statist. 24 (1996) 508-539. | MR | Zbl

[30] D.L. Donoho, R.C. Liu, B. Macgibbon, Minimax risk over hyperrectangles, and implications, Ann. Statist. 18 (1990) 1416-1437. | MR | Zbl

[31] P.P.B. Eggermont, V.N. Lariccia, Maximum Penalized Likelihood Estimation, vol. I: Density Estimation, Springer, New York, 2001. | MR | Zbl

[32] P. Groeneboom, Some current developments in density estimation, in: Bakker J.W. De, Hazewinkel M., Lenstra J.K. (Eds.), Mathematics and Computer Science, CWI Monograph, vol. 1, Elsevier, Amsterdam, 1986, pp. 163-192. | MR | Zbl

[33] L. Györfi, M. Kohler, A. Kryżak, H. Walk, A Distribution-Free Theory of Nonparametric Regression, Springer, New York, 2002. | Zbl

[34] P.J. Huber, A robust version of the probability ratio test, Ann. Math. Statist. 36 (1965) 1753-1758. | MR | Zbl

[35] P.J. Huber, Robust Statistics, John Wiley, New York, 1981. | MR | Zbl

[36] I.M. Johnstone, Chi-square oracle inequalities, in: Gunst M.C.M. De, Klaassen C.A.J., Vaart A.W. Van Der (Eds.), State of the Art in Probability and Statistics, Festschrift for Willem R. van Zwet, Lecture Notes Monograph Ser., vol. 36, Institute of Mathematical Statistics, 2001, pp. 399-418. | MR

[37] A. Juditsky, A.S. Nemirovski, Functional aggregation for nonparametric estimation, Ann. Statist. 28 (2000) 681-712. | MR | Zbl

[38] G. Kerkyacharian, D. Picard, Thresholding algorithms, maxisets and well-concentrated bases, Test 9 (2000) 283-344. | MR | Zbl

[39] A.N. Kolmogorov, V.M. Tikhomirov, ε-entropy and ε-capacity of sets in function spaces, Amer. Math. Soc. Transl. (2) 17 (1961) 277-364. | Zbl

[40] B. Laurent, P. Massart, Adaptive estimation of a quadratic functional by model selection, Ann. Statist. 28 (2000) 1302-1338. | MR | Zbl

[41] L.M. Le Cam, On the assumptions used to prove asymptotic normality of maximum likelihood estimates, Ann. Math. Statist. 41 (1970) 802-828. | MR | Zbl

[42] L.M. Le Cam, Limits of experiments, in: Proc. 6th Berkeley Symp. on Math. Stat. and Prob. I, 1972, pp. 245-261. | MR | Zbl

[43] L.M. Le Cam, Convergence of estimates under dimensionality restrictions, Ann. Statist. 1 (1973) 38-53. | MR | Zbl

[44] L.M. Le Cam, On local and global properties in the theory of asymptotic normality of experiments, in: Puri M. (Ed.), Stochastic Processes and Related Topics, vol. 1, Academic Press, New York, 1975, pp. 13-54. | MR | Zbl

[45] L.M. Le Cam, Asymptotic Methods in Statistical Decision Theory, Springer-Verlag, New York, 1986. | MR | Zbl

[46] L.M. Le Cam, Maximum likelihood: an introduction, Inter. Statist. Rev. 58 (1990) 153-171. | Zbl

[47] L.M. Le Cam, Metric dimension and statistical estimation, CRM Proc. and Lecture Notes 11 (1997) 303-311. | MR | Zbl

[48] G.G. Lorentz, Approximation of Functions, Holt, Rinehart, Winston, New York, 1966. | MR | Zbl

[49] G.G. Lorentz, M. Von Golitschek, Y. Makovoz, Constructive Approximation, Advanced Problems, Springer, Berlin, 1996. | MR | Zbl

[50] A.S. Nemirovski, Topics in non-parametric statistics, in: Bernard P. (Ed.), Lecture on Probability Theory and Statistics, Ecole d'Eté de Probabilités de Saint-Flour XXVIII - 1998, Lecture Notes in Math., vol. 1738, Springer-Verlag, Berlin, 2000, pp. 85-297. | MR | Zbl

[51] M. Nussbaum, Asymptotic equivalence of density estimation and Gaussian white noise, Ann. Statist. 24 (1996) 2399-2430. | MR | Zbl

[52] A. Pinkus, n-widths in Approximation Theory, Springer-Verlag, Berlin, 1985. | MR | Zbl

[53] M.S. Pinsker, Optimal filtration of square-integrable signals in Gaussian noise, Problems Inform. Transmission 16 (1980) 120-133. | MR | Zbl

[54] X. Shen, W.H. Wong, Convergence rates of sieve estimates, Ann. Statist. 22 (1994) 580-615. | MR | Zbl

[55] B.W. Silverman, On the estimation of a probability density function by the maximum penalized likelihood method, Ann. Statist. 10 (1982) 795-810. | MR | Zbl

[56] A.B. Tsybakov, Optimal rates of aggregation, in: Proceedings of 16th Annual Conference on Learning Theory (COLT) and 7th Annual Workshop on Kernel Machines, Lecture Notes in Artificial Intelligence, vol. 2777, Springer-Verlag, Berlin, 2003, pp. 303-313.

[57] S. Van De Geer, Estimating a regression function, Ann. Statist. 18 (1990) 907-924. | MR | Zbl

[58] S. Van De Geer, Hellinger-consistency of certain nonparametric maximum likelihood estimates, Ann. Statist. 21 (1993) 14-44. | MR | Zbl

[59] S. Van De Geer, Empirical Processes in M-Estimation, Cambridge University Press, Cambridge, 2000. | MR

[60] A.W. Van Der Vaart, Asymptotic Statistics, Cambridge University Press, Cambridge, 1998. | MR | Zbl

[61] G. Wahba, Spline Models for Observational Data, SIAM, Philadelphia, PA, 1990. | MR | Zbl

[62] A. Wald, Note on the consistency of the maximum likelihood estimate, Ann. Math. Statist. 20 (1949) 595-601. | MR | Zbl

[63] M.H. Wegkamp, Model selection in nonparametric regression, Ann. Statist. 31 (2003) 252-273. | MR | Zbl

[64] W.H. Wong, X. Shen, Probability inequalities for likelihood ratios and convergence rates of sieve MLEs, Ann. Statist. 23 (1995) 339-362. | MR | Zbl

[65] Y. Yang, Minimax optimal density estimation, Ph.D. dissertation, Dept. of Statistics, Yale University, New Haven, 1996.

[66] Y. Yang, Mixing strategies for density estimation, Ann. Statist. 28 (2000) 75-87. | MR | Zbl

[67] Y. Yang, Combining different procedures for adaptive regression, J. Multivariate Anal. 74 (2000) 135-161. | MR | Zbl

[68] Y. Yang, Adaptive regression by mixing, J. Amer. Statist. Assoc. 96 (2001) 574-588. | MR | Zbl

[69] Y. Yang, How accurate can any regression procedure be?, Technical report, Iowa State University, Ames, 2001, http://www.public.iastate.edu/yyang/papers/index.html.

[70] Y. Yang, Aggregating regression procedures to improve performance, Bernoulli 10 (2004) 25-47. | MR | Zbl

[71] Y. Yang, A.R. Barron, An asymptotic property of model selection criteria, IEEE Trans. Inform. Theory 44 (1998) 95-116. | MR | Zbl

[72] Y. Yang, A.R. Barron, Information-theoretic determination of minimax rates of convergence, Ann. Statist. 27 (1999) 1564-1599. | MR | Zbl

[73] Y.G. Yatracos, Rates of convergence of minimum distance estimates and Kolmogorov's entropy, Ann. Statist. 13 (1985) 768-774. | MR | Zbl

[74] B. Yu, Assouad, Fano and Le Cam, in: Pollard D., Torgersen E., Yang G. (Eds.), Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics, Springer-Verlag, New York, 1997, pp. 423-435. | MR | Zbl

Cited by Sources: