We observe a matrix of independent, identically distributed Gaussian random variables which are centered except for elements of some submatrix of size where the mean is larger than some . The submatrix is sparse in the sense that and tend to 0, whereas and tend to infinity. We consider the problem of selecting the random variables with significantly large mean values, as was also considered by [M. Kolar, S. Balakrishnan, A. Rinaldo and A. Singh, NIPS (2011)]. We give sufficient conditions on as a function of and and construct a uniformly consistent procedure in order to do sharp variable selection. We also prove the minimax lower bounds under necessary conditions which are complementary to the previous conditions. The critical values separating the necessary and sufficient conditions are sharp (we show exact constants), whereas [M. Kolar, S. Balakrishnan, A. Rinaldo and A. Singh, NIPS (2011)] only prove rate optimality and focus on suboptimal computationally feasible selectors. Note that rate optimality in this problem leaves out a large set of possible parameters, where we do not know whether consistent selection is possible.
Mots-clés : Estimation, minimax testing, large matrices, selection of sparse signal, sharp selection bounds, variable selection
@article{PS_2015__19__115_0, author = {Butucea, Cristina and Ingster, Yuri I. and Suslina, Irina A.}, title = {Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix}, journal = {ESAIM: Probability and Statistics}, pages = {115--134}, publisher = {EDP-Sciences}, volume = {19}, year = {2015}, doi = {10.1051/ps/2014017}, mrnumber = {3374872}, zbl = {1330.62169}, language = {en}, url = {http://archive.numdam.org/articles/10.1051/ps/2014017/} }
TY - JOUR AU - Butucea, Cristina AU - Ingster, Yuri I. AU - Suslina, Irina A. TI - Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix JO - ESAIM: Probability and Statistics PY - 2015 SP - 115 EP - 134 VL - 19 PB - EDP-Sciences UR - http://archive.numdam.org/articles/10.1051/ps/2014017/ DO - 10.1051/ps/2014017 LA - en ID - PS_2015__19__115_0 ER -
%0 Journal Article %A Butucea, Cristina %A Ingster, Yuri I. %A Suslina, Irina A. %T Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix %J ESAIM: Probability and Statistics %D 2015 %P 115-134 %V 19 %I EDP-Sciences %U http://archive.numdam.org/articles/10.1051/ps/2014017/ %R 10.1051/ps/2014017 %G en %F PS_2015__19__115_0
Butucea, Cristina; Ingster, Yuri I.; Suslina, Irina A. Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix. ESAIM: Probability and Statistics, Tome 19 (2015), pp. 115-134. doi : 10.1051/ps/2014017. http://archive.numdam.org/articles/10.1051/ps/2014017/
Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 (2006) 559–1047. | MR | Zbl
, , and ,Detection of an anomalous clusters in a network. Ann. Statist. 39 (2011) 278–304. | MR | Zbl
, and ,E. Arias-Castro, E.J. Candès and Y. Plan, Global Testing and Sparse Alternatives: ANOVA, Multiple Comparisons and the Higher Criticism. Preprint arXiv:1007.1434 (2010). | MR | Zbl
Near-optimal detection of geometric objects by fast multiscale methods. IEEE Trans. Inform. Theory 51 (2005) 2402–2425. | MR | Zbl
, and ,E. Arias-Castro and K. Lounici, Variable selection with exponential weights and -penalization. Preprint arxiv:1208.2635 (2012).
Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 (1995) 289–3300. | MR | Zbl
and ,Selection of variables and dimension reduction in high-dimensional non-parametric regression. Electron. J. Stat. 2 (2008) 1224–1241. | MR | Zbl
and ,Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 (2009) 1705–1732. | MR | Zbl
, and ,Detection of a sparse submatrix of a high-dimensional noisy matrix. Bernoulli 19 (2013) 2652–2688. | MR | Zbl
and ,C. Butucea and G. Gayraud, Sharp detection of smooth signals in a high-dimensional sparse matrix with indirect observations. Preprint arxiv:1301.4660 (2013). | MR
Estimation and confidence sets for sparse normal mixtures. Ann. Statist. 35 (2007) 2421–2449. | MR | Zbl
, and ,Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. IEEE Trans. Inform. Theory. 57 (2011) 2342–2359 | MR | Zbl
and ,Exact matrix completion via convex optimization. Found. Comput. Math. 9 (2009) 717–772. | MR | Zbl
and ,Tight conditions for consistency of variable selection in the context of high dimensionality. Ann. Statist. 40 (2012) 2359–2763. | MR | Zbl
and ,Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 (2004) 962–994. | MR | Zbl
and ,Maximum entropy and the nearly black object. With Discussion. J. Roy. Statist. Soc., Ser. B. 54 (1992) 41-81. | MR | Zbl
, , and ,P. Embrechts, C. Klüppelberg and T. Mikosch, Modelling extremal events: for insurance and finance.Springer (1997). | MR | Zbl
Recovering low-rank matrices from few coefficients in any basis IEEE, Information theory 57 (2010) 1548–1566. | MR | Zbl
,Some problems of hypothesis testing leading to infinitely divisible distributions. Math. Methods Stat. 6 (1997) 47–69. | MR | Zbl
,Yu.I. Ingster and N.A. Stepanova, Adaptive selection of sparse regression function components. Zapiski Nauchn. Sem. POMI ZAI (2012).
Yu.I. Ingster and I.A. Suslina, Nonparametric goodness-of-fit testing under gaussian models. Vol. 169 of Lect. Notes in Statist. Springer-Verlag, New York (2003). | MR | Zbl
On a detection of a signal of known shape in multichannel system. Zapiski Nauchn. Sem. POMI 294 (2002) 88–112, Transl. J. Math. Sci. 127 (2002) 1723–1736. | MR | Zbl
and ,Matrix completion from noisy entries. J. Mach. Learn. Res. 11 (2010) 2057–2078. | MR | Zbl
, and ,M. Kolar, S. Balakrishnan, A. Rinaldo and A. Singh, Minimax localization of structural information in large noisy matrices. NIPS (2011).
Nuclear norm penalization and optimal rates for noisy low rank matrix completion. Ann. Statist. 39 (2011) 2302–2329. | MR | Zbl
, and ,Rodeo: sparse, greedy nonparametric regression. Ann. Statist. 36 (2008) 28–63. | MR | Zbl
and ,A simpler approach to matrix completion. J. Machine Learning 12 (2011) 3413–3430. | MR | Zbl
,Estimation of high-dimensional low-rank matrices, Ann. Statist. 39 (2011) 887–930. | MR | Zbl
and ,On the maximal size of Large-Average and ANOVA-fit Submatrices in a Gaussian Random Matrix. Bernoulli 19 (2013) 275–294. | MR | Zbl
and ,Finding Large Average Submatrices in High Dimensional Data. Ann. Appl. Statist. 3 (2009) 985–1012. | MR | Zbl
, , and ,A.B. Tsybakov, Introduction to nonparametric statistics. Springer Ser. Stat. Springer, New-York (2009). | MR | Zbl
Minimax risks for sparse regressions: Ultra-high dimensional phenomenons. Electron. J. Stat. 6 (2012) 38-90. | MR | Zbl
,Cité par Sources :