Numéro spécial : analyse des données fonctionnelles
Regression on functional data: methodological approach with application to near-infrared spectrometry
[Régression sur données fonctionnelles : démarche méthodologique et applications à la spectrométrie dans le proche infrarouge]
Journal de la société française de statistique, Tome 155 (2014) no. 2, pp. 100-120.

On s’intéresse à la situation où on observe une variable réponse réelle ainsi qu’une variable fonctionnelle comme prédicteur. Pour fixer les idées, dans notre problème issu de l’industrie pétrolière, la variable réponse correspond à l’indice d’octane d’un échantillon d’essence alors que la variable explicative représente son spectre dans le proche infrarouge. La communauté statisticienne a développé de nombreux modèles pour traiter de tels jeux de données et nous nous concentrerons particulièrement sur quatre d’entre eux : deux standards à l’instar du modèle de régression linéaire fonctionnelle et de la régression nonparamétrique fonctionnelle, et deux récemment développés : la régression fonctionnelle à directions révélatrices et un modèle parcimonieux basé sur une méthode de sélection nonparamétrique de variables. Chacune de ces méthodes sont mises en oeuvre avec deux jeux de données contenant des spectres dans le proche infrarouge. Une étude comparative de ces modèles est réalisée afin d’identifier les éventuels avantages et inconvénients de chacun d’eux. Pour finir, nous proposons dans une démarche méthodologique de rendre plus performants les deux modèles de régression les plus récents en tenant compte des informations les plus pertinentes obtenues par chacun des modèles étudiés. Nous montrons sur les données spectrométriques comment une telle démarche peut conduire à d’importantes améliorations.

We consider the situation when one observes a scalar response and a functional variable as predictor. For instance, in our petroleum industry problem, the response is the octane number of a gasoline sample and the functional predictor is a curve representing its near-infrared spectrum. The statistician community developed numerous models for handling such datasets and we focus here on four regression models: two standards as the functional linear model and the functional nonparametric regression, and two recently developed: the functional projection pursuit regression and a parsimonious model involving a nonparametric variable selection method. Each of these models are implemented with two datasets containing near-infrared spectrometric curves. A comparative study of these models is provided in order to emphasize their possible advantages and drawbacks. At last, a simple but useful methodological approach is then proposed in order to boost the two most recent regression models by combining the most relevant informations obtained by each of the studied models. We show on the spectrometric data how such an approach may lead to important improvements.

Keywords: boosting, functional data, functional linear regression, functional nonparametric regression, functional projection pursuit regression, nonparametric variable selection, near-infrared spectrometry
Mot clés : données fonctionnelles, régression fonctionnelle à projections révélatrices, régression linéaire fonctionnelle, régression nonparamétrique fonctionnelle, sélection nonparamétrique de variables, spectrométrie dans le proche infrarouge
@article{JSFS_2014__155_2_100_0,
     author = {Ferraty, Fr\'ed\'eric},
     title = {Regression on functional data: methodological approach with application to near-infrared spectrometry},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {100--120},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {155},
     number = {2},
     year = {2014},
     zbl = {1316.62005},
     language = {en},
     url = {http://archive.numdam.org/item/JSFS_2014__155_2_100_0/}
}
TY  - JOUR
AU  - Ferraty, Frédéric
TI  - Regression on functional data: methodological approach with application to near-infrared spectrometry
JO  - Journal de la société française de statistique
PY  - 2014
SP  - 100
EP  - 120
VL  - 155
IS  - 2
PB  - Société française de statistique
UR  - http://archive.numdam.org/item/JSFS_2014__155_2_100_0/
LA  - en
ID  - JSFS_2014__155_2_100_0
ER  - 
%0 Journal Article
%A Ferraty, Frédéric
%T Regression on functional data: methodological approach with application to near-infrared spectrometry
%J Journal de la société française de statistique
%D 2014
%P 100-120
%V 155
%N 2
%I Société française de statistique
%U http://archive.numdam.org/item/JSFS_2014__155_2_100_0/
%G en
%F JSFS_2014__155_2_100_0
Ferraty, Frédéric. Regression on functional data: methodological approach with application to near-infrared spectrometry. Journal de la société française de statistique, Tome 155 (2014) no. 2, pp. 100-120. http://archive.numdam.org/item/JSFS_2014__155_2_100_0/

[1] Amato, U.; Antoniadis, A.; I., De Feis Dimension reduction in functional regression with application, Comput. Statist. Data Anal., Volume 50 (2006), pp. 2422-2446 | Zbl

[2] Ait-Saidi, A.; Ferraty, F.; Kassa, R.; Vieu, P. Cross-validated estimation in the single-functional index model, Statistics, Volume 42 (2008), pp. 475-494 | Zbl

[3] Bosq, D. Linear processes in function spaces: theory and applications, 149, Springer Verlag, 2000 | Zbl

[4] Borggaard, C.; Thodberg, H. Optimal minimal neural interpretation of spectra, Analytical chemistry, Volume 64 (1992) no. 5, pp. 545-551

[5] Bühlmann, P.; van de Geer, S. Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer, 2011 | Zbl

[6] Cardot, H.; Ferraty, F.; Sarda, P. Functional linear model, Statist. Probab. Lett., Volume 45 (1999) no. 1, pp. 11-22 | Zbl

[7] Cai, T.; Hall, P. Prediction in functional linear regression, Ann. Statist., Volume 34 (2006), pp. 2159-2179 | Zbl

[8] Chen, D.; Hall, P.; Müller, H.-G. Single and multiple index functional regression models with nonparametric link, Ann. Statist., Volume 39 (2011), pp. 1720-1747 | Zbl

[9] Cardot, H.; Mas, A.; Sarda, P. CLT in functional linear regression models, Probab. Theory Related Fields, Volume 138 (2007) no. 3-4, pp. 325-361 | Zbl

[10] Candès, E.; Tao, T. The Dantzig selector: statistical estimation when p is much larger than n , Ann. Statist., Volume 35 (2007), pp. 2313-2351 | Zbl

[11] Dou, W. MFDF: Modeling Functional Data in Finance (2009) http://CRAN.R-project.org/package=MFDF (R package version 0.0-2)

[12] Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least angle regression, Ann. Statist., Volume 32 (2004), pp. 407-499 | Zbl

[13] Febrero-Bande, M.; Oviedo de la Fuente, M. fda.usc: Functional Data Analysis and Utilities for Statistical Computing (fda.usc) (2012) http://CRAN.R-project.org/package=fda.usc (R package version 0.9.7)

[14] Fan, J.; Gijbels, I. Local Polynomial Modeling and its Applications, Chapman and Hall, London, 1996 | Zbl

[15] Ferraty, F.; Goia, A.; Salinelli, E.; Vieu, P. Functional projection pursuit regression, TEST, Volume 22 (2013), pp. 293-320 | Zbl

[16] Ferraty, F.; Hall, P. An algorithm for nonlinear, nonparametric model choice and prediction, (arXiv:1401.8097) (2014)

[17] Ferraty, F.; Hall, P; Vieu, P. Most-predictive design points for functional data predictors, Biometrika, Volume 97 (2010) no. 4, pp. 807-824 | Zbl

[18] Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc., Volume 96 (2001), pp. 1348-1360 | Zbl

[19] Ferraty, F.; Park, J.; Vieu, P. Average derivative projection pursuit regression (Submitted work)

[20] The oxford handbook of functional data analysis (Ferraty, F.; Romain, Y., eds.), Oxford University Press New York, 2011 | Zbl

[21] Friedman, J.; Stuetzle, W. Projection Pursuit Regression, J. Amer. Statist. Assoc., Volume 1981 (1981), pp. 817-823

[22] Ferraty, F.; Vieu, P. The functional nonparametric model and application to spectrometric data, Comput. Statist., Volume 17 (2002) no. 4, pp. 545-564 | Zbl

[23] Ferraty, F.; Vieu, P. Nonparametric functional data analysis, Springer, New York, 2006 | Zbl

[24] Hyndman, R.; Shang, H. ftsa: Functional time series analysis (2012) http://CRAN.R-project.org/package=ftsa (R package version 3.1)

[25] Huber, P.J. Projection Pursuit, Ann. Statist., Volume 13 (1985), pp. 435-475 | Zbl

[26] Kalivas, J. Two data sets of near infrared spectra, Chemometr. Intell. Lab., Volume 37 (1997) no. 2, pp. 255-259

[27] Lin, W.; Kulasekera, K. Identifiability of single-index models and additive-index models, Biometrika, Volume 94 (2007), pp. 496-501 | Zbl

[28] Markussen, B. fdaMixed: Functional data analysis in a mixed model framework (2011) http://CRAN.R-project.org/package=fdaMixed (R package version 0.1)

[29] Martens, H.; Naes, T. Multivariate calibration, Wiley, 1992 | Zbl

[30] Müller, H.-G.; Stadtmüller, U. Generalized functional linear models, Ann. Statist., Volume 33 (2005), pp. 774-805 | Zbl

[31] Osborne, B.; Fearn, T. Near Infrared Spectroscopy in Food Analysis, Wiley New York, 1986

[32] Peng, J.; Müller, H.-G. Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions, Ann. Appl. Stat., Volume 2 (2008), pp. 1056-1077 | Zbl

[33] Peng, J.; Paul, D. fpca: Restricted MLE for Functional Principal Components Analysis (2011) http://CRAN.R-project.org/package=fpca (R package version 0.2-1)

[34] R Development Core Team R: A Language and Environment for Statistical Computing (2012) http://www.R-project.org/ (ISBN 3-900051-07-0)

[35] Ramsay, J.; Silverman, B. Applied functional data analysis: methods and case studies, 77, Springer New York, 2002

[36] Ramsay, J.; Silverman, B. Functional data analysis, Springer New York, 2005

[37] Ramsay, J.; Wickham, H.; Graves, S.; Hooker, G. fda: Functional Data Analysis (2012) http://CRAN.R-project.org/package=fda (R package version 2.2.8)

[38] Shang, H.; Hyndman, R. fds: Functional data sets (2011) http://CRAN.R-project.org/package=fds (R package version 1.6)

[39] Shang, H.; Hyndman, R. rainbow: Rainbow plots, bagplots and boxplots for functional data (2012) http://CRAN.R-project.org/package=rainbow (R package version 2.8)

[40] Tibshirani, R. Regression analysis and selection via the lasso, J. R. Stat. Soc. B, Volume 58 (1996), pp. 267-288 | Zbl

[41] Yao, F.; Müller, H.-G. Functional quadratic regression, Biometrika, Volume 97 (2010) no. 1, pp. 49-64 | Zbl

[42] Yang, W.; Müller, H.-G.; Stadtmüller, U. Functional singular component analysis, J. R. Stat. Soc. B, Volume 73 (2011) no. 3, pp. 303-324 | Zbl

[43] Yao, F.; Müller, H.-G.; Wang, J.-L. Functional data analysis for sparse longitudinal data, J. Amer. Statist. Assoc., Volume 100 (2005) no. 470, pp. 577-590 | Zbl

[44] Yao, F.; Müller, H.-G.; Wang, J.-L. Functional linear regression analysis for longitudinal data, Ann. Statist., Volume 33 (2005) no. 6, pp. 2873-2903 | Zbl

[45] Yuan, M. On the identifiability of additive index models, Statist. Sinica, Volume 21 (2011), pp. 1901-1911 | Zbl