We consider a problem of multiclass classification, where the training sample $$ is generated from the model ℙ(Y = m|X = x) = η$$(x), 1 ≤ m ≤ M, and η1(x), …, η$$(x) are unknown α-Holder continuous functions. Given a test point X, our goal is to predict its label. A widely used k-nearest-neighbors classifier constructs estimates of η1(X), …, η$$(X) and uses a plug-in rule for the prediction. However, it requires a proper choice of the smoothing parameter k, which may become tricky in some situations. We fix several integers n1, …, n$$, compute corresponding n$$-nearest-neighbor estimates for each m and each n$$ and apply an aggregation procedure. We study an algorithm, which constructs a convex combination of these estimates such that the aggregated estimate behaves approximately as well as an oracle choice. We also provide a non-asymptotic analysis of the procedure, prove its adaptation to the unknown smoothness parameter α and to the margin and establish rates of convergence under mild assumptions.
Accepté le :
Première publication :
Publié le :
DOI : 10.1051/ps/2019021
Mots-clés : Multiclass classification, k nearest neighbors, adaptive procedures
@article{PS_2020__24_1_69_0, author = {Puchkin, Nikita and Spokoiny, Vladimir}, title = {An adaptive multiclass nearest neighbor classifier}, journal = {ESAIM: Probability and Statistics}, pages = {69--99}, publisher = {EDP-Sciences}, volume = {24}, year = {2020}, doi = {10.1051/ps/2019021}, mrnumber = {4069295}, zbl = {1440.62250}, language = {en}, url = {http://archive.numdam.org/articles/10.1051/ps/2019021/} }
TY - JOUR AU - Puchkin, Nikita AU - Spokoiny, Vladimir TI - An adaptive multiclass nearest neighbor classifier JO - ESAIM: Probability and Statistics PY - 2020 SP - 69 EP - 99 VL - 24 PB - EDP-Sciences UR - http://archive.numdam.org/articles/10.1051/ps/2019021/ DO - 10.1051/ps/2019021 LA - en ID - PS_2020__24_1_69_0 ER -
Puchkin, Nikita; Spokoiny, Vladimir. An adaptive multiclass nearest neighbor classifier. ESAIM: Probability and Statistics, Tome 24 (2020), pp. 69-99. doi : 10.1051/ps/2019021. http://archive.numdam.org/articles/10.1051/ps/2019021/
[1] Selective sampling algorithms for cost-sensitive multiclass prediction, in Proceedings of the 30th International Conference on Machine Learning, edited by and , Vol. 28 of Proceedings of Machine Learning Research. Atlanta, Georgia, USA, (2013) 1220–1228.
,[2] Corporate credit rating using multiclass classification models with order information. World Acad. Sci. Eng. Technol. 60 (2011) 95–100.
and ,[3] Reducing multiclass to binary: a unifying approach for margin classifiers. J. Mach. Learn. Res. 1 (2001) 113–141. | MR | Zbl
, and ,[4] Fast learning rates for plug-in classifiers. Ann. Stat. 35 (2007) 608–633. | MR | Zbl
and ,[5] Spatial aggregation of local likelihood estimates with applications to classification. Ann. Stat. 35 (2007) 2287–2311. | DOI | MR | Zbl
and ,[6] Optimal exponential bounds for aggregation of estimators for the Kullback-Leibler loss. Electron. J. Stat. 11 (2017) 2258–2294. | DOI | MR | Zbl
, , and ,[7] Local nearest neighbour classification with applications to semi-supervisedlearning (2017), . | arXiv
, and ,[8] Rates of convergence for nearest neighbor classification, in Vol. 2 of Proceedings of the 27th International Conference on Neural Information Processing Systems, NIPS’14, Cambridge, MA, USA (2014) 3437–3445.
and ,[9] On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2 (2002) 265–292. | Zbl
and ,[10] Deviation optimal learning using greedy Q-aggregation. Ann. Stat. 40 (2012) 1878–1905. | MR | Zbl
, and ,[11] Multiclass learning approaches: A theoretical comparison with implications, in Advancesin Neural Information Processing Systems 25, edited by , , , and , Curran Associates Inc. (2012), 485–493.
, and . ,[12] Taniskidou UCI machine learning repository, 2017.
and[13] Solving multiclass learning problems via error-correcting output codes. J. Artif. Int. Res. 2 (1995) 263–286. | Zbl
and ,[14] Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17 (2001) 349–358.
and ,[15] Theory and Applications of Models of Computation, Vol. 9076 of Lecture Notes in Computer Sciences. Springer, Cham (2015) 375–387. | MR
, , , and , in[16] Rate of convergence of k-nearest-neighbor classification rule. J. Mach. Learn. Res., 18 (2017) 16. | MR | Zbl
, and ,[17] Classification in general finite dimensional spaces with the k-nearest neighbor rule. Ann. Stat. 44 (2016) 982–1009. | MR | Zbl
, and ,[18] Application of support vector machines to speech recognition. IEEE Trans. Signal Process. 52 (2004) 2348–2355.
, and ,[19] Learning by mirror averaging. Ann. Stat. 36 (2008) 2183–2206. | MR | Zbl
, and ,[20] Face verification via error correcting output codes. Image Vis. Comput. 21 (2003) 1163–1169. | DOI
, , and .[21] Optimal rates of aggregation in classification under low noise assumption. Bernoulli 13 (2007) 1000–1022. | DOI | MR | Zbl
,[22] Empirical risk minimization is optimal for the convex aggregation problem. Bernoulli 19 (2013) 2153–2166. | DOI | MR | Zbl
,[23] Optimal learning with Q-aggregation. Ann. Stat. 42 (2014) 211–224. | DOI | MR | Zbl
and ,[24] Smooth discrimination analysis. Ann. Stat. 27 (1999) 1808–1829. | DOI | MR | Zbl
and ,[25] The multi-armed bandit problem with covariates. Ann. Stat. 41 (2013) 693–721. | DOI | MR | Zbl
and .[26] In defense of one-vs-all classification. J. Mach. Learn. Res. 5 (2003/04) 101–141. | MR | Zbl
and ,[27] Kullback-Leibler aggregation and misspecified generalized linear models. Ann. Stat. 40 (2012) 639–665. | DOI | MR | Zbl
,[28] Sparse estimation by exponential weighting. Stat. Sci. 27 (2012) 558–575. | DOI | MR | Zbl
and ,[29] Shifting, one-inclusion mistake bounds and tight multiclass expected risk bounds, in Advances in Neural Information Processing Systems 19, edited by , , and , MIT Press, Cambridge (2007) 1193–1200.
, and ,[30] Optimal weighted nearest neighbour classifiers. Ann. Stat. 40 (2012) 2733–2763. | DOI | MR | Zbl
,[31] Parameter tuning in pointwise adaptation using a propagation approach. Ann. Stat. 37 (2009) 2783–2807. | DOI | MR | Zbl
and ,[32] Class prediction by nearest shrunken centroids, with applications to dna microarrays. Stat. Sci. 18 (2003) 02. | DOI | MR | Zbl
, , and ,[33] Optimal Rates of Aggregation, Springer, Berlin (2003) 303–313. | Zbl
,[34] Recursive aggregation of estimators by the mirror descent method with averaging. Problemy Peredachi Informatsii 41 (2005) 78–96. | Zbl
, , and ,Cité par Sources :
Financial support by the Russian Academic Excellence Project 5-100 and by the German Research Foundation (DFG) through the Collaborative Research Center 1294 is gratefully acknowledged.