Seeking relevant information from a statistical model
ESAIM: Probability and Statistics, Tome 20 (2016), pp. 463-479.

We herein introduce a general variable selection procedure, which can be applied to several parametric multivariate problems, including principal components and regression, among others. The aim is to allow the identification of a small subset of the original variables that can ‘better explain’ the model through nonparametric relationships. The method typically yields some noisy uninformative variables and some variables that are strongly related because of their general dependence and our aim is to help understand the underlying structures in a given data–set. The asymptotic behaviour of the proposed method is considered and some real and simulated data–sets are analysed as examples.

Reçu le :
Accepté le :
DOI : 10.1051/ps/2016022
Classification : 62H30, 68T10, 62G20
Mots-clés : Variable selection, regression, principal components analysis
Fraiman, Ricardo 1 ; Gimenez, Yanina 2 ; Svarc, Marcela 2

1 Centro de Matemática, Universidad de la República, Iguá 4225, Malvín Norte 11400, Montevideo, Uruguay.
2 Universidad de San Andrés and Conicet, Vito Dumas 284, Victoria 1644, Buenos Aires, Argentina.
@article{PS_2016__20__463_0,
     author = {Fraiman, Ricardo and Gimenez, Yanina and Svarc, Marcela},
     title = {Seeking relevant information from a statistical model},
     journal = {ESAIM: Probability and Statistics},
     pages = {463--479},
     publisher = {EDP-Sciences},
     volume = {20},
     year = {2016},
     doi = {10.1051/ps/2016022},
     zbl = {1353.62070},
     language = {en},
     url = {http://archive.numdam.org/articles/10.1051/ps/2016022/}
}
TY  - JOUR
AU  - Fraiman, Ricardo
AU  - Gimenez, Yanina
AU  - Svarc, Marcela
TI  - Seeking relevant information from a statistical model
JO  - ESAIM: Probability and Statistics
PY  - 2016
SP  - 463
EP  - 479
VL  - 20
PB  - EDP-Sciences
UR  - http://archive.numdam.org/articles/10.1051/ps/2016022/
DO  - 10.1051/ps/2016022
LA  - en
ID  - PS_2016__20__463_0
ER  - 
%0 Journal Article
%A Fraiman, Ricardo
%A Gimenez, Yanina
%A Svarc, Marcela
%T Seeking relevant information from a statistical model
%J ESAIM: Probability and Statistics
%D 2016
%P 463-479
%V 20
%I EDP-Sciences
%U http://archive.numdam.org/articles/10.1051/ps/2016022/
%R 10.1051/ps/2016022
%G en
%F PS_2016__20__463_0
Fraiman, Ricardo; Gimenez, Yanina; Svarc, Marcela. Seeking relevant information from a statistical model. ESAIM: Probability and Statistics, Tome 20 (2016), pp. 463-479. doi : 10.1051/ps/2016022. http://archive.numdam.org/articles/10.1051/ps/2016022/

J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981). | Zbl

J. Dauxois, A. Pousse and Y. Romain, Asymptotic theory for the principal component analysis of a vector random function: Some applications to statistical inference. J. Multivariate Anal. 12 (1982) 136–154. | DOI | Zbl

K.A. De Jong and W.M. Spears, Using genetic algorithms to solve NP-complete problems. In Proc. of the Third International Conference on Genetic Algorithms. Edited by J.D. Schaffer (1989) 124–132.

B. Efron, T. Hastie, I. Johnstone and R. Tibshirani, Least angle regression. With discussion, and a rejoinder by the authors. Ann. Stat. 32 (2004) 407–499. | Zbl

J. Fan and R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96 (2001) 1348–1361. | DOI | Zbl

R. Fraiman, A. Justel and M. Svarc, Selection of variables for cluster analysis and classification rules. J. Am. Stat. Assoc. 103 (2008) 1294–1303. | DOI | Zbl

C. Fraley and A.E. Raftery, Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97 (2002) 611–631. | DOI | Zbl

C. Fraley and A.E. Raftery, MCLUST Version 3 for R: Normal Mixture Modeling and Model-based Clustering. Technical Report No. 504, Department of Statistics, University of Washington (2009).

Y. Gimenez, Selección de variables para datos multivariado y para datos funcionales. Ph.D. thesis (2015). Available at http://cms.dm.uba.ar/academico/carreras/doctorado/TesisYaninaGimenez.pdf

B.E. Hansen, Uniform convergence rates for kernel estimation with dependent data. Econometric Theory 24 (2008) 726–748. | DOI | Zbl

W.K. Härdle and L. Simar, Applied Multivariate Statistical Analysis. Springer Verlag, Berlin (2007). | Zbl

T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning. Data mining, Inference and Prediction. Springer Verlag, Berlin (2001). | Zbl

X. He and P. Shi, Bivariate tensor-product B-splines in partly linear models. J. Multivariate Anal. 58 (1996) 162–181. | DOI | Zbl

J. Hoeting, A.E. Raftrey and D. Madigan, Bayesian variable and transformation selection in linear regression. J. Comput. Graph. Statist. 11 (2002) 485–507. | DOI

I.T. Jolliffe, Principal Components Analysis, 2nd edition. Springer Verlag, Berlin (2002). | Zbl

R. Li, and G. Gong, K-NN nonparametric estimation of regression functions in the presence of irrelevant variables. Econom. J. 11 (1987) 396–408. | DOI | Zbl

R.A. Marona, D.R. Martin and V.Y. Yohai, Robust Statistics. Theory and Methods. Wiley (2006). | Zbl

G.P. Mccabe, Principal variables. Technometrics 26 (1984) 137–144. | DOI | Zbl

G.A.F. Seber and A.J. Lee, Linear regression analysis, Second edition. Wiley series in probability and statistics (2005). | Zbl

L.J. Snell, Topics in Contemporary Probability and its Applications. CRC Press (1995). | Zbl

R. Tibshirani, Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58 (1996) 267–288. | Zbl

D.M. Witten and R. Tibshirani, Testing significance of features by lassoed principal components. Ann. Appl. Stat. 2 (2008) 986–1012. | DOI | Zbl

D.M. Witten and R. Tibshirani, A framework for feature selection in clustering. J. Am. Stat. Assoc. 105 (2010) 713–726. | DOI | Zbl

D.M. Witten, R. Tibshirani and T. Hastie, A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10 (2009) 515–534. | DOI | Zbl

C.H. Zhang, Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38 (2010) 894–942. | DOI | Zbl

H. Zou, T. Hastie and R. Tibshirani, Sparse principal components analysis. J. Comput. Graph. Stat. 15 (2006) 265–286. | DOI

Cité par Sources :