The k nearest neighbour learning rule (under the uniform distance tie breaking) is universally consistent in every metric space X that is sigma-finite dimensional in the sense of Nagata. This was pointed out by Cérou and Guyader (2006) as a consequence of the main result by those authors, combined with a theorem in real analysis sketched by D. Preiss (1971) (and elaborated in detail by Assouad and Quentin de Gromard (2006)). We show that it is possible to give a direct proof along the same lines as the original theorem of Charles J. Stone (1977) about the universal consistency of the k-NN classifier in the finite dimensional Euclidean space. The generalization is non-trivial because of the distance ties being more prevalent in the non-Euclidean setting, and on the way we investigate the relevant geometric properties of the metrics and the limitations of the Stone argument, by constructing various examples.
Accepté le :
Première publication :
Publié le :
DOI : 10.1051/ps/2020018
Mots-clés : $$-NN classifier, universal consistency, geometric Stone lemma, distance ties, Nagata dimension, sigma-finite dimensional metric spaces
@article{PS_2020__24_1_914_0, author = {Collins, Beno{\^\i}t and Kumari, Sushma and Pestov, Vladimir G.}, title = {Universal consistency of the {\protect\emph{k}-NN} rule in metric spaces and {Nagata} dimension}, journal = {ESAIM: Probability and Statistics}, pages = {914--934}, publisher = {EDP-Sciences}, volume = {24}, year = {2020}, doi = {10.1051/ps/2020018}, mrnumber = {4178790}, zbl = {1455.62123}, language = {en}, url = {http://archive.numdam.org/articles/10.1051/ps/2020018/} }
TY - JOUR AU - Collins, Benoît AU - Kumari, Sushma AU - Pestov, Vladimir G. TI - Universal consistency of the k-NN rule in metric spaces and Nagata dimension JO - ESAIM: Probability and Statistics PY - 2020 SP - 914 EP - 934 VL - 24 PB - EDP-Sciences UR - http://archive.numdam.org/articles/10.1051/ps/2020018/ DO - 10.1051/ps/2020018 LA - en ID - PS_2020__24_1_914_0 ER -
%0 Journal Article %A Collins, Benoît %A Kumari, Sushma %A Pestov, Vladimir G. %T Universal consistency of the k-NN rule in metric spaces and Nagata dimension %J ESAIM: Probability and Statistics %D 2020 %P 914-934 %V 24 %I EDP-Sciences %U http://archive.numdam.org/articles/10.1051/ps/2020018/ %R 10.1051/ps/2020018 %G en %F PS_2020__24_1_914_0
Collins, Benoît; Kumari, Sushma; Pestov, Vladimir G. Universal consistency of the k-NN rule in metric spaces and Nagata dimension. ESAIM: Probability and Statistics, Tome 24 (2020), pp. 914-934. doi : 10.1051/ps/2020018. http://archive.numdam.org/articles/10.1051/ps/2020018/
[1] Recouvrements, derivation des mesures et dimensions. Rev. Mat. Iberoam. 22 (2006) 893–953. | DOI | MR | Zbl
and ,[2] Nearest neighbor classification in infinite dimension. ESAIM: PS 10 (2006) 340–355. | DOI | Numdam | MR | Zbl
and ,[3] Nearest neighbour pattern classification. IEEE Trans. Info. Theory 13 (1967) 21–27. | DOI | Zbl
and ,[4] On the almost everywhere convergence of nonparametric regression function estimates. Ann. Statist. 9 (1981) 1310–1319. | DOI | MR | Zbl
,[5] A Probabilistic Theory of Pattern Recognition, Springer–Verlag, New York (1996). | DOI | MR | Zbl
, and ,[6] Applying Supervised Learning Algorithms and a New Feature Selection Method to Predict Coronary Artery Disease. M.Sc. thesis, University of Ottawa. Preprint (2014). | arXiv
,[7] General Topology, Sigma Series in Pure Mathematics, 2nd edn. Heldermann Verlag, Berlin (1989) 6. | MR | Zbl
,[8] Consistent nonparametric regression for functional data under the Stone–Besicovitch conditions. IEEE Transac. Inf. Theory 58 (2012) 6697–6708. | DOI | MR | Zbl
, and ,[9] k-Nearest Neighbour Classification of Datasets with a Family of Distances. M.Sc. thesis, University of Ottawa. Preprint (2015). | arXiv
,[10] Topics in Random Matrices and Statistical Machine Learning. Ph.D. thesis, Kyoto University. Preprint (2018). | arXiv
,[11] Lusin’s theorem and Bochner integration. Sci. Math. Japan 60 (2004) 113–120. | MR | Zbl
and ,[12] On a special metric characterizing a metric space of dim ≤ n. Proc. Japan Acad. 39 (1963) 278–282. | MR | Zbl
,[13] On a special metric and dimension. Fund. Math. 55 (1964) 181–194. | DOI | MR | Zbl
,[14] Modern dimension theory. Bibliotheca Mathematica, Interscience Publishers, North–Holland, Amsterdam (1965) 6. | MR | Zbl
,[15] Open problems left in my wake of research. Topol. Appl. 146 (2005) 5–13. | DOI | MR | Zbl
,[16] A conjecture of J. Nagata on dimension and metrization. Bull. Amer. Math. Soc. 71 (1965) 623–625. | DOI | MR | Zbl
,[17] Invalid Vitali theorems, in: Abstracta. 7th Winter School on Abstract Analysis, Czechoslovak Academy of Sciences, Prague, Czechia (1979) 58–60.
,[18] Dimension of metrics and differentiation of measures, General topology and its relations to modern analysis and algebra, V (Prague, 1981) Sigma Series of Pure Mathematics, Heldermann, Berlin (1983) 565–568. | MR | Zbl
,[19] Hubs in space: popular nearest neighbors in high-dimensional data. J. Mach. Learn. Res. 11 (2010) 2487–2531. | MR | Zbl
, and ,[20] Consistent nonparametric regression. Ann. Stat. 5 (1977) 595–645. | DOI | MR | Zbl
,[21] Distance metric learning for large margin classification. J. Mach. Learn. Res. 10 (2009) 207–244. | Zbl
and ,Cité par Sources :
S.K. would like to thank JICA-FRIENDSHIP scholarship (fellowship D-1590283/ J-1593019) for supporting her doctoral study at Kyoto University and Department of Mathematics, Kyoto University for supporting the Brazil research trip under the KTGU project.
V.G.P. wants to acknowledge support from CNPq (bolsa Pesquisador Visitante, processo 310012/2016) and CAPES (bolsa Professor Visitante Estrangeiro Sênior, processo 88881.117018/2016-01).