Statistical estimation of conditional Shannon entropy

Bulinski, Alexander; Kozhevin, Alexey

doi:10.1051/ps/2018026

Bulinski, Alexander ¹ ; Kozhevin, Alexey ¹

ESAIM: Probability and Statistics, Tome 23 (2019), pp. 350-386.

Suite au passage du modèle économique de la revue en S20, le texte intégral des articles des années 2019 et 2020 est accessible uniquement sur le site de l'éditeur et est réservé aux abonnés.

Résumé

The new estimates of the conditional Shannon entropy are introduced in the framework of the model describing a discrete response variable depending on a vector of d factors having a density w.r.t. the Lebesgue measure in ℝ$$. Namely, the mixed-pair model (X, Y ) is considered where X and Y take values in ℝ$$ and an arbitrary finite set, respectively. Such models include, for instance, the famous logistic regression. In contrast to the well-known Kozachenko–Leonenko estimates of unconditional entropy the proposed estimates are constructed by means of the certain spacial order statistics (or k-nearest neighbor statistics where k = k$$ depends on amount of observations n) and a random number of i.i.d. observations contained in the balls of specified random radii. The asymptotic unbiasedness and L²-consistency of the new estimates are established under simple conditions. The obtained results can be applied to the feature selection problem which is important, e.g., for medical and biological investigations.

Reçu le : 2018-06-28
Accepté le : 2018-11-28

MR Zbl

DOI : 10.1051/ps/2018026

Classification : 60F25, 62G20, 62H12
Mots-clés : Shannon entropy, conditional entropy estimates, asymptotic unbiasedness, L²-consistency, logistic regression, Gaussian model

Affiliations des auteurs :

Bulinski, Alexander ¹ ; Kozhevin, Alexey ¹

@article{PS_2019__23__350_0,
     author = {Bulinski, Alexander and Kozhevin, Alexey},
     title = {Statistical estimation of conditional {Shannon} entropy},
     journal = {ESAIM: Probability and Statistics},
     pages = {350--386},
     publisher = {EDP-Sciences},
     volume = {23},
     year = {2019},
     doi = {10.1051/ps/2018026},
     zbl = {1418.60026},
     mrnumber = {3975702},
     language = {en},
     url = {http://archive.numdam.org/articles/10.1051/ps/2018026/}
}

TY  - JOUR
AU  - Bulinski, Alexander
AU  - Kozhevin, Alexey
TI  - Statistical estimation of conditional Shannon entropy
JO  - ESAIM: Probability and Statistics
PY  - 2019
SP  - 350
EP  - 386
VL  - 23
PB  - EDP-Sciences
UR  - http://archive.numdam.org/articles/10.1051/ps/2018026/
DO  - 10.1051/ps/2018026
LA  - en
ID  - PS_2019__23__350_0
ER  -

%0 Journal Article
%A Bulinski, Alexander
%A Kozhevin, Alexey
%T Statistical estimation of conditional Shannon entropy
%J ESAIM: Probability and Statistics
%D 2019
%P 350-386
%V 23
%I EDP-Sciences
%U http://archive.numdam.org/articles/10.1051/ps/2018026/
%R 10.1051/ps/2018026
%G en
%F PS_2019__23__350_0

Bulinski, Alexander; Kozhevin, Alexey. Statistical estimation of conditional Shannon entropy. ESAIM: Probability and Statistics, Tome 23 (2019), pp. 350-386. doi : 10.1051/ps/2018026. http://archive.numdam.org/articles/10.1051/ps/2018026/

Bibliographie
Cité par

[1] P. Alonso-Ruiz and E. Spodarev, Entropy-based inhomogeneity detection in porous media. Preprint (2016). | arXiv | MR

[2] E. Archer, I.M. Park and J.W. Pillow, Bayesian entropy estimation for countable discrete distributions. J. Mach. Learn. Res. 15 (2014) 2833–2868. | MR | Zbl

[3] L. Benguigui, The different paths to entropy. Eur. J. Phys. 34 (2013) 303–321. | DOI | Zbl

[4] M. Bennasar, Y. Hicks and R. Setchi, Feature selection using joint mutual information maximisation. Exp. Syst. Appl. 42 (2014) 8520–8532. | DOI

[5] T.B. Berrett, R.J. Samworth and M. Yuan, Efficient multivariate entropy estimation via k-nearest neighbour distances. J. Reine Angew. Math. 673 (2012) 1–31.

[6] G. Biau and L. Devroye, Lectures of the Nearest Neighbor Method. Springer, Cham (2015). | DOI | MR

[7] J. Beirlant. E.J. Dudewicz, L. Györfi and E.C. Van Der Meulen, Nonparametric entropy estimation: an overview. Int. J. Math. Stat. Sci. 6 (1997) 17–39. | MR | Zbl

[8] V.I. Bogachev, Measure Theory. Springer-Verlag, Berlin (2007). | DOI | MR | Zbl

[9] V.S. Borkar, Probability Theory. An Advanced Course. Springer-Verlag, New York (1995). | DOI | MR | Zbl

[10] A. Bulinski and D. Dimitrov, Statistical estimation of the Shannon entropy. Acta Math. Sin. 35 (2019) 17–46. | DOI | MR | Zbl

[11] A. Charzyńska and A. Gambin, Improvement of the k-NN entropy estimator with applications in systems biology. Entropy 18 (2016) 13. | DOI

[12] F. Coelho, A.P. Braga and M. Verleysen, A mutual information estimator for continuous and discrete variables applied to feature selection and classification problems. Int. J. Comput. Intell. Syst. 9 (2016) 726–733. | DOI

[13] T.M. Cover and J.A. Thomas, Elements of Information Theory, 2nd ed. Wiley–Interscience, New York (1991). | DOI | MR | Zbl

[14] S. Delattre and N. Fournier, On the Kozachenko–Leonenko entropy estimator. J. Stat. Plan. Infer. 185 (2017) 69–93. | DOI | MR | Zbl

[15] G. Doquire and M. Verleysen, A comparison of mutual information estimators for feature selection, in Proc. of the 1st International Conference on Pattern Recognition Applications and Methods (2012) 176–185.

[16] D. Evans, A computationally efficient estimator for mutual information. Proc. R. Soc. Lond. Ser. A 464 (2008) 1203–1215. | MR | Zbl

[17] F. Fleuret, Fast binary feature selection with conditional mutual information. J. Mach. Learn. Res. 4 (2004) 1531–1555. | MR | Zbl

[18] W. Gao, S. Kannan, S. Oh and P. Viswanath, Estimating Mutual Information for Discrete-Continuous Mixtures. Preprint (2017). | arXiv

[19] P. Hall and S.C. Morton, On the estimation of entropy. Ann. Inst. Stat. Math. 45 (1993) 69–88. | DOI | MR | Zbl

[20] Y. Han, J. Jiao, T. Wissman and Y. Wu, Optimal rates of entropy estimation over Lipschitz balls. Preprint (2017). | arXiv | MR

[21] J.M. Hilbe, Practical Guide to Logistic Regression. CRC Press, Boca Raton (2015).

[22] J. Jiao, J.W. Gao and Y. Han, The nearest neighbor information estimator is adaptively near minimax rate-optimal. Preprint (2017). | arXiv

[23] D.G. Kleinbaum and M. Klein, Logistic Regression. A Self-Learning Text, 3rd ed. with contributions by E.R. Pryor. Springer, New York (2010). | Zbl

[24] L.F. Kozachenko and N.N. Leonenko, Sample estimate of the entropy of a random vector. Probl. Inf. Trans. 23 (1987) 95–101. | MR | Zbl

[25] A. Kraskov, H. Stögbauer and P. Grassberger, Estimating mutual information. Phys. Rev. E 69 (2004) 066138. | DOI | MR

[26] L. Massaron and A. Boschetti, Regression Analysis with Python. Packt Publishing Ltd., Birmingham (2016).

[27] P. Massart, Concentration inequalities and model selection, in École d’Été de Probabilités de Saint-Flour XXXIII – 2003. Springer–Verlag, Berlin (2007). | MR | Zbl

[28] E.G. Miller, A new class of entropy estimators for multidimensional densities, in Proc. of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP03), Hong Kong, China (April 06–10, 2003) 297–300.

[29] J. Montalvão, R. Attux and D. Silva, A pragmatic entropy and differential entropy estimator for small datasets. J. Commun. Inf. Syst. 29 (2014) 29–36.

[30] I. Muqattash and M. Yahdi, Infinite family of approximations of the Digamma function. Math. Comput. Model. 43 (2006) 1329–1336. | DOI | MR | Zbl

[31] C. Nair, B. Prabhakar and D. Shah, On entropy for mixtures of discrete and continuous variables. Preprint (2007). | arXiv

[32] J. Novovicová, P. Somol and P. Pudil, Conditional mutual information based feature selection for classification task, in CIARP 2007, in Vol. 4756 of Lect. Notes Comput. Sci., edited by L. Rueda, D. Mery, J. Kittler. Springer-Verlag, Berlin, Heidelberg (2007) 417–426. | DOI

[33] D. Pál, B. Póczos and C. Szepesvári, Estimation of Rényi entropy and mutual information based on generalized nearest-neighbor graphs, in Proc. of the 23rd International Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada (2010) 1849–1857.

[34] L. Paninski, Estimation of entropy and mutual information. Neural Comput. 15 (2003) 1191–1253. | DOI | Zbl

[35] H. Peng, F. Long and C. Ding, Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005) 1226–1238. | DOI

[36] M.D. Penrose and J.E. Yukich, Limit theory for point processes in manifolds. Ann. Appl. Prob. 23 (2013) 2161–2211. | DOI | MR | Zbl

[37] I. Sason and S. Verdú, F-divergence inequalities. IEEE Trans. Inf. Theory 62 (2016) 5973–6006. | DOI | MR | Zbl

[38] C.E. Shannon, A mathematical theory of communication. Bell Syst. Tech. J. 27 (1948) 379–423; 623–656. | DOI | MR | Zbl

[39] A.N. Shiryaev, Probability – 1, 3rd ed. Springer, New York (2016). | MR | Zbl

[40] S. Singh and B. Pószoc, Analysis of k-nearest neighbor distances with application to entropy estimation. Preprint (2016). | arXiv

[41] K. Sricharan, D. Wei and A.O. Hero, Ensemble estimators for multivariate entropy estimation. IEEE Trans. Inf. Theory 59 (2013) 4374–4388. | DOI | MR | Zbl

[42] D. Stowell and M.D. Plumbley, Fast multidimensional entropy estimation by k-d partitioning. IEEE Signal Process. Lett. 16 (2009) 537–540. | DOI

[43] A.B. Tsybakov and E.C. Van Der Meulen, Root-n consistent estimators of entropy for densities with unbounded support. Scand. J. Stat. 23 (1996) 75–83. | MR | Zbl

[44] J.R. Vergara and P.A. Estévez, A review of feature selection methods based on mutual information. Neural Comput. Appl. 24 (2014) 175–186. | DOI

[45] J. Yeh, Real Analysis: Theory of Measure and Integration, 3rd ed. World Scientific, Singapore (2014). | DOI | MR | Zbl

[46] A.M. Zubkov and A.A. Serov, A complete proof of universal inequalities for distribution function of binomial law. Theory Probab. Appl. 57 (2013) 539–544. | DOI | MR | Zbl

Cité par Sources :