Biau, Gérard; Cérou, Frédéric; Guyader, Arnaud
New insights into Approximate bayesian Computation
Annales de l'I.H.P. Probabilités et statistiques, Tome 51 (2015) no. 1 , p. 376-403
Zbl 06412909 | MR 3300975
doi : 10.1214/13-AIHP590
URL stable : http://www.numdam.org/item?id=AIHPB_2015__51_1_376_0

Classification:  62C10,  62F15,  62G20
Le terme anglais « Approximate Bayesian Computation » (ABC en abrégé) désigne une famille de techniques bayésiennes ayant pour objet la simulation selon une loi de probabilité lorsque la vraisemblance a posteriori n’est pas disponible ou s’avère impossible à évaluer numériquement. Dans le présent article, nous envisageons cette procédure du point de vue de la théorie des k-plus proches voisins, en nous attachant plus particulièrement à examiner les propriétés statistiques des sorties de l’algorithme. Cela nous conduit à analyser le comportement asymptotique d’un estimateur de la densité conditionnelle naturellement associé à ABC, utilisé en pratique et possédant à la fois les caractéristiques d’un estimateur des k-plus proches voisins et celles d’une méthode à noyau.
Approximate Bayesian Computation (ABC for short) is a family of computational techniques which offer an almost automated solution in situations where evaluation of the posterior likelihood is computationally prohibitive, or whenever suitable likelihoods are not available. In the present paper, we analyze the procedure from the point of view of k-nearest neighbor theory and explore the statistical properties of its outputs. We discuss in particular some asymptotic features of the genuine conditional density estimate associated with ABC, which is an interesting hybrid between a k-nearest neighbor and a kernel method.

Bibliographie

[1] I. S. Abramson. On bandwidth variation in kernel estimates – A square root law. Ann. Statist. 10 (1982) 1217–1223. MR 673656 | Zbl 0507.62040

[2] D. M. Bashtannyk and R. J. Hyndman. Bandwidth selection for kernel conditional density estimation. Comput. Statist. Data Anal. 36 (2001) 279–298. MR 1836204 | Zbl 1038.62034

[3] M. Beaumont, J.-M. Cornuet, J.-M. Marin and C. P. Robert. Adaptive approximate Bayesian computation. Biometrika 96 (2009) 983–990. MR 2767283 | Zbl 05650366

[4] M. A. Beaumont, W. Zhang and D. J. Balding. Approximate Bayesian computation in population genetics. Genetics 162 (2002) 2025–2035.

[5] G. Biau, F. Cérou and A. Guyader. On the rate of convergence of the bagged nearest neighbor estimate. J. Mach. Learn. Res. 11 (2010) 687–712. MR 2600626 | Zbl 1242.62025

[6] M. Blum. Approximate Bayesian computation: A nonparametric perspective. J. Amer. Statist. Assoc. 105 (2010) 1178–1187. MR 2752613

[7] L. Breiman, W. Meisel and E. Purcell. Variable kernel estimates of multivariate densities. Technometrics 19 (1977) 135–144. Zbl 0379.62023

[8] F. Cérou and A. Guyader. Nearest neighbor classification in infinite dimension. ESAIM Probab. Stat. 10 (2006) 340–355. Numdam | MR 2247925 | Zbl 1187.62115

[9] T. M. Cover. Estimation by the nearest neighbor rule. IEEE Trans. Inform. Theory 14 (1968) 50–55. Zbl 0157.49404

[10] M. De Guzmán. Differentiation of Integrals in n . Lecture Notes in Mathematics 481. Springer, Berlin, 1975. MR 457661 | Zbl 0327.26010

[11] L. Devroye. Necessary and sufficient conditions for the pointwise convergence of nearest neighbor regression function estimates. Z. Wahrsch. Verw. Gebiete 61 (1982) 467–481. MR 682574 | Zbl 0483.62029

[12] L. Devroye and A. Krzyżak. New multivariate product density estimates. J. Multivariate Anal. 82 (2002) 88–110. MR 1918616 | Zbl 0995.62034

[13] L. Devroye, L. Györfi and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, New York, 1996. MR 1383093 | Zbl 0853.68150

[14] J. Fan and T. H. Yim. A crossvalidation method for estimating conditional densities. Biometrika 91 (2004) 819–834. MR 2126035 | Zbl 1078.62032

[15] O. P. Faugeras. A quantile-copula approach to conditional density estimation. J. Multivariate Anal. 100 (2009) 2083–2099. MR 2543088 | Zbl 1170.62030

[16] P. Fearnhead and D. Prangle. Constructing summary statistics for approximate Bayesian computation: Semi-automatic approximate Bayesian computation. J. Roy. Statist. Soc. Ser. B 74 (2012) 419–474. MR 2925370

[17] E. Fix and J. L. Hodges. Discriminatory analysis – Nonparametric discrimination: Consistency properties. Project 21-49-004, Report Number 4, USAF School of Aviation Medicine, Randolph Field, TX, 1951. Zbl 0715.62080

[18] Y. X. Fu and W. H. Li. Estimating the age of the common ancestor of a sample of DNA sequences. Mol. Biol. Evol. 14 (1997) 195–199.

[19] L. Györfi and M. Kohler. Nonparametric estimation of conditional distributions. IEEE Trans. Inform. Theory 53 (2007) 1872–1879. MR 2317148 | Zbl 05455560

[20] P. Hall and J. S. Marron. Variable window width kernel estimates of probability densities. Probab. Theory Related Fields 80 (1988) 37–49. MR 970470 | Zbl 0637.62036

[21] P. Hall, J. Racine and Q. Li. Cross-validation and the estimation of conditional probability densities. J. Amer. Statist. Assoc. 99 (2004) 1015–1026. MR 2109491 | Zbl 1055.62035

[22] B. H. Hansen. Nonparametric conditional density estimation. Technical report, Univ. Wisconsin, 2004.

[23] G. H. Hardy, J. E. Littlewood and G. Pólya. Inequalities. Cambridge Univ. Press, Cambridge, 1988. MR 944909 | Zbl 0634.26008

[24] W. K. Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 (1970) 97–109. Zbl 0219.65008

[25] R. J. Hyndman, D. M. Bashtannyk and G. K. Grunwald. Estimating and visualizing conditional densities. J. Comput. Graph. Statist. 5 (1996) 315–336. MR 1422114

[26] B. Jessen, J. Marcinkiewicz and A. Zygmund. Note on the differentiability of multiple integrals. Fund. Math. 25 (1935) 217–234. | JFM 61.0255.01

[27] M. C. Jones. Variable kernel density estimates and variable kernel density estimates. Aust. J. Stat. 32 (1990) 361–371. MR 1098587

[28] P. Joyce and P. Marjoran. Approximately sufficient statistics and Bayesian computation. Stat. Appl. Genet. Mol. Biol. 7 (2008) Art. ID 26. MR 2438407 | Zbl 1276.62077

[29] E. Kaufmann and R.-D. Reiss. On conditional distributions of nearest neighbors. J. Multivariate Anal. 42 (1992) 67–76. MR 1177518 | Zbl 0773.60037

[30] D. O. Loftsgaarden and C. P. Quesenberry. A nonparametric estimate of a multivariate density function. Ann. Math. Statist. 36 (1965) 1049–1051. MR 176567 | Zbl 0132.38905

[31] Y. P. Mack and M. Rosenblatt. Multivariate k-nearest neighbor density estimates. J. Multivariate Anal. 9 (1979) 1–15. MR 530638 | Zbl 0406.62023

[32] J. M. Marin and C. P. Robert. Bayesian Core: A Practical Approach to Computational Bayesian Statistics. Springer, New York, 2007. MR 2289769 | Zbl 1137.62013

[33] J. M. Marin, N. Pillai, C. P. Robert and J. Rousseau. Relevant statistics for Bayesian model choice. J. R. Stat. Soc. Ser B. To appear, 2014. MR 3271169 | Zbl 1137.62013

[34] J. M. Marin, P. Pudlo, C. P. Robert and R. Ryder. Approximate Bayesian computational methods. Stat. Comput. 22 (2012) 1167–1180. MR 2992292 | Zbl 1252.62022

[35] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller and E. Teller. Equations of state calculations by fast computing machines. J. Chem. Phys. 21 (1953) 1087–1091.

[36] D. S. Moore and J. W. Yackel. Consistency properties of nearest neighbor density function estimators. Ann. Statist. 5 (1977) 143–154. MR 426275 | Zbl 0358.60053

[37] D. S. Moore and J. W. Yackel. Large sample properties of nearest neighbor density function estimators. In Statistical Decision Theory and Related Topics II: Proceedings of a Symposium Held at Purdue University, May 17–19, 1976, S. S. Gupta and D. S. Moore (Eds) 269–279. Academic Press, New York, 1977. MR 431497 | Zbl 0419.62036

[38] E. A. Nadaraya. On estimating regression. Theory Probab. Appl. 9 (1964) 141–142. Zbl 0136.40902

[39] E. A. Nadaraya. On nonparametric estimates of density functions and regression curves. Theory Probab. Appl. 10 (1965) 186–190. Zbl 0134.36302

[40] E. Parzen. On the estimation of a probability density function and the mode. Ann. Math. Statist. 33 (1962) 1065–1076. MR 143282 | Zbl 0116.11302

[41] J. K. Pritchard, M. T. Seielstad, A. Perez-Lezaun and M. W. Feldman. Population growth of human Y chromosomes: A study of Y chromosome microsatellites. Mol. Biol. Evol. 16 (1999) 1791–1798.

[42] B. D. Ripley. Stochastic Simulation. Wiley, New York, 1982. MR 2299137 | Zbl 0613.65006

[43] C. P. Robert and G. Casella. Monte Carlo Statistical Methods, 2nd edition. Springer, New York, 2004. Zbl 1096.62003 | MR 2080278 | Zbl 0935.62005

[44] C. P. Robert, J.-M. Cornuet, J.-M. Marin and N. S. Pillai. Lack of confidence in approximate Bayesian computation model choice. Proc. Natl. Acad. Sci. USA 108 (2011) 15112–15117.

[45] M. Rosenblatt. Conditional probability density and regression estimates. In Multivariate Analysis II, P. R. Krishnaiah (Ed.) 25–31. Academic Press, New York, 1969. MR 254987

[46] R. M. Royall. A class of non-parametric estimates of a smooth regression function. Ph.D. thesis, Stanford Univ., 1966. MR 2615964

[47] D. Rubin. Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Statist. 12 (1984) 1151–1172. MR 760681 | Zbl 0555.62010

[48] S. A. Sisson, Y. Fan and M. M. Tanaka. Sequential Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 104 (2007) 1760–1765. MR 2301870 | Zbl 1160.65005

[49] E. M. Stein. Singular Integrals and Differentiability Properties of Functions. Princeton Univ. Press, Princeton, 1970. MR 290095 | Zbl 0207.13501

[50] C. J. Stone. Consistent nonparametric regression. Ann. Statist. 5 (1977) 595–645. MR 443204 | Zbl 0366.62051

[51] S. Tavaré, D. Balding, R. Griffith and P. Donnelly. Inferring coalescence times from DNA sequence data. Genetics 145 (1997) 505–518.

[52] G. S. Watson. Smooth regression analysis. Sankhya A 26 (1964) 359–372. MR 185765 | Zbl 0137.13002

[53] R. L. Wheeden and A. Zygmund. Measure and Integral. An Introduction to Real Analysis. Marcel Dekker, New York, 1977. MR 492146 | Zbl 0362.26004

[54] R. D. Wilkinson. Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. Stat. Appl. Genet. Mol. Biol. 12 (2008) 129–141. MR 3071024

[55] A. Zygmund. Trigonometric Series, Vol. II. Cambridge Univ. Press, Cambridge, 1959. MR 107776 | Zbl 0085.05601