Joint selection of wavenumber regions for MidIR and RAMAN spectra and variables in PLS regression using Genetic Algorithms
Journal de la société française de statistique, Volume 154 (2013) no. 3, pp. 80-94.

Many methods exist for feature selection in PLS regression when there are too many variables. Less methods are available for selecting wavenumber regions for MidIR or RAMAN spectra. In this work, PLS has been coupled with genetic algorithms to allow for the selection of intervals in spectra. This work was motivated by a regression issue about transformation of cassava. Those data consist of three tables: RAMAN spectra, MidIR spectra and physico-chemical variables. The purpose is to adapt to this regression context a strategy previously designed to select intervals in NIR spectra in classification. A new algorithm is proposed to fit such multiblock data in PLS1 regression context. Illustrations on simulated data are performed before application to the real dataset.

De nombreuses méthodes adaptées pour la régression PLS, s’intéressent aux choix de variables explicatives, quand celles-ci sont en nombre trop important. Quand il s’agit de sélectionner des intervalles pour des spectres, la panoplie des techniques est plus réduite. Dans ce travail, PLS a été associée aux algorithmes génétiques pour permettre la sélection d’intervalles dans des spectres. L’origine de ce travail est une problématique de régression pour des données sur la transformation de manioc. Ces données sont constituées de trois tableaux : des spectres RAMAN, MidIR et des variables physico-chimiques. Il s’agit d’adapter au contexte de régression une stratégie précédemment mise au point pour la sélection d’intervalles uniquement pour des spectres NIR en discrimination. Nous avons développé un algorithme génétique spécialement adapté à ce type de données (multitableau), pour le cas de la régression PLS1. Des illustrations sur des données simulées sont proposées avant l’application au jeu de données réel.

Keywords: PLS Regression, Genetic Algorithm, MidIR and RAMAN spectra, Variable Selection, Selection of wavenumber regions
Mot clés : Méthode PLS, Algorithme Génétique, Spectres MidIR et RAMAN, Choix de variables, Sélection d’intervalles
     author = {Grosmaire, Lidwine and Reyn\`es, Christelle and Sabatier, Robert},
     title = {Joint selection of wavenumber regions for {MidIR} and {RAMAN} spectra and variables in {PLS} regression using {Genetic} {Algorithms}},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {80--94},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {154},
     number = {3},
     year = {2013},
     mrnumber = {3147067},
     zbl = {1316.62088},
     language = {en},
     url = {}
AU  - Grosmaire, Lidwine
AU  - Reynès, Christelle
AU  - Sabatier, Robert
TI  - Joint selection of wavenumber regions for MidIR and RAMAN spectra and variables in PLS regression using Genetic Algorithms
JO  - Journal de la société française de statistique
PY  - 2013
SP  - 80
EP  - 94
VL  - 154
IS  - 3
PB  - Société française de statistique
UR  -
LA  - en
ID  - JSFS_2013__154_3_80_0
ER  - 
%0 Journal Article
%A Grosmaire, Lidwine
%A Reynès, Christelle
%A Sabatier, Robert
%T Joint selection of wavenumber regions for MidIR and RAMAN spectra and variables in PLS regression using Genetic Algorithms
%J Journal de la société française de statistique
%D 2013
%P 80-94
%V 154
%N 3
%I Société française de statistique
%G en
%F JSFS_2013__154_3_80_0
Grosmaire, Lidwine; Reynès, Christelle; Sabatier, Robert. Joint selection of wavenumber regions for MidIR and RAMAN spectra and variables in PLS regression using Genetic Algorithms. Journal de la société française de statistique, Volume 154 (2013) no. 3, pp. 80-94.

[1] Bertolini, A.C.; Mestres, C.; Lourdin, D.; Valle, G.D.; Colonna, P. Relationship between thermomechanical properties and baking expansion of sour cassava starch (polvilho azedo), Journal of the Science of Food and Agriculture, Volume 81 (2001) no. 4, pp. 429-435

[2] Bhandari, D.; Murthy, CA; Pal, S.K. Genetic algorithm with elitist model and its convergence, International Journal of Pattern Recognition and Artificial Intelligence, Volume 10 (1996) no. 6, pp. 731-747

[3] Dufour, D.; Brabet, C.; Zakhia, N.; Chuzel, G.; Egbe, T.A.; Brauman, A.; Treche, S. Influence de la fermentation et du séchage solaire sur l’acquisition du pouvoir de panification de l’amidon aigre de manioc, Agbor Egbe T., Braumann A., Griffon D., Trèche S. Transformation Alimentaire du Manioc. Editions ORSTOM: Paris, Francia (1995), pp. 399-417

[4] Demiate, IM; Dupuy, N.; Huvenne, JP; Cereda, MP; Wosiacki, G. Relationship between baking behavior of modified cassava starches and starch chemical structure determined by FTIR spectroscopy, Carbohydrate Polymers, Volume 42 (2000) no. 2, pp. 149-158

[5] Du, YP; Liang, YZ; Jiang, JH; Berry, RJ; Ozaki, Y. Spectral regions selection to improve prediction ability of PLS models by changeable size moving window partial least squares and searching combination moving window partial least squares, Analytica chimica acta, Volume 501 (2004) no. 2, pp. 183-191

[6] Dias, A.R.G.; Zavareze, E.R.; Elias, M.C.; Helbig, E.; Da Silva, D.O.; Ciacco, C.F. Pasting, expansion and textural properties of fermented cassava starch oxidised with sodium hypochlorite, Carbohydrate Polymers, Volume 84 (2011) no. 1, pp. 268-275

[7] Goldberg, D.E. Genetic algorithms in search, optimization, and machine learning, Addison-wesley, 1989 | Zbl

[8] Goodfellow, BJ; Wilson, RH A Fourier transform IR study of the gelation of amylose and amylopectin, Biopolymers, Volume 30 (1990) no. 13-14, pp. 1183-1189

[9] Holland, J.H. Adaptation in natural and artificial systems, University of Michigan press, 1975 no. 53 | MR | Zbl

[10] Høskuldsson, A. Variable and subset selection in PLS regression, Chemometrics and intelligent laboratory systems, Volume 55 (2001) no. 1-2, pp. 23-38

[11] Juhász, R.; Salgo, A. Pasting behavior of amylose, amylopectin and their mixtures as determined by RVA curves and first derivatives, Starch-Stärke, Volume 60 (2008) no. 2, pp. 70-78

[12] Kizil, R.; Irudayaraj, J.; Seetharaman, K. Characterization of irradiated starches by using FT-Raman and FTIR spectroscopy, Journal of agricultural and food chemistry, Volume 50 (2002) no. 14, pp. 3912-3918

[13] Klug Tavares, A.C.; Zanatta, E.; da Rosa Zavareze, E.; Helbig, E.; Guerra Dias, A.R. The effects of acid and oxidative modification on the expansion properties of rice flours with varying levels of amylose, LWT-Food Science and Technology, Volume 43 (2010) no. 8, pp. 1213-1219

[14] Leardi, R. Genetic algorithms in chemometrics and chemistry: a review, Journal of chemometrics, Volume 15 (2001) no. 7, pp. 559-569

[15] Leardi, R.; Nørgaard, L. Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions, Journal of Chemometrics, Volume 18 (2004) no. 11, pp. 486-497

[16] Mestres, C.; Boungou, O.; Akissoe, N.; Zakhia, N. Comparison of the expansion ability of fermented maize flour and cassava starch during baking, Journal of the Science of Food and Agriculture, Volume 80 (2000) no. 6, pp. 665-672

[17] Marcon, M.J.A.; Kurtz, D.J.; Raguzzoni, J.C.; Delgadillo, I.; Maraschin, M.; Soldi, V.; Reginatto, V.; Amante, E.R. Expansion Properties of Sour Cassava Starch (Polvilho Azedo): Variables Related to its Practical Application in Bakery, Starch-Stärke, Volume 61 (2009) no. 12, pp. 716-726

[18] Mutungi, C.; Onyango, C.; Doert, T.; Paasch, S.; Thiele, S.; Machill, S.; Jaros, D.; Rohm, H. Long-and short-range structural changes of recrystallised cassava starch subjected to in vitro digestion, Food Hydrocolloids, Volume 25 (2011) no. 3, pp. 477-485

[19] Mestres, C.; Rouau, X. Influence of natural fermentation and drying conditions on the physicochemical characteristics of cassava starch, Journal of the Science of Food and Agriculture, Volume 74 (1997) no. 2, pp. 147-155

[20] Norgaard, L.; Saudland, A.; Wagner, J.; Nielsen, J.P.; Munck, L.; Engelsen, SB Interval partial least-squares regression (iPLS): a comparative chemometric study with an example from near-infrared spectroscopy, Applied Spectroscopy, Volume 54 (2000) no. 3, pp. 413-419

[21] Padgett, C.W.; Saad, A. Genetic Algorithms in Chemistry: Success or Failure Is in the Genes, Applications of Soft Computing, Springer, 2009, pp. 181-189

[22] R Development Core Team R: A Language and Environment for Statistical Computing (2011) (ISBN 3-900051-07-0)

[23] Reeves, C.R.; Rowe, J.E. Genetic algorithms: principles and perspectives: a guide to GA theory, 20, Springer, 2003 | MR | Zbl

[24] Reynes, C.; Souza, S.; Sabatier, R.; Figueres, G.; Vidal, B. Selection of discriminant wavelength intervals in NIR spectrometry with genetic algorithms, Journal of Chemometrics, Volume 20 (2006) no. 3-4, pp. 136-145

[25] Shariati-Rad, M.; Hasani, M. Selection of individual variables versus intervals of variables in PLSR, Journal of Chemometrics, Volume 24 (2010) no. 2, pp. 45-56

[26] Thomas, D.J.; Atwell, W.A.; of Cereal Chemists, American Association Starches, Eagan Press Minnesota, 1999

[27] Tamaki, M.; Kihara, R.; Okuda, M.; Aramaki, I.; Katsuba, Z.; Tsuchiya, T. Properties of Starch and Protein of Hattan-Type Varieties of Rice Suitable for Brewing Original Hiroshima Sake, Plant production science, Volume 8 (2005) no. 5, pp. 586-591

[28] Tonukari, N.J. Cassava and the future of starch, Electronic journal of biotechnology, Volume 7 (2004) no. 1, pp. 5-8

[29] van Soest, J.J.G.; Tournois, H.; de Wit, D.; Vliegenthart, J.F.G. Short-range structure in (partially) crystalline potato starch determined with attenuated total reflectance Fourier-transform IR spectroscopy, Carbohydrate Research, Volume 279 (1995), pp. 201-214

[30] Wold, S.; Johansson, E.; Cocchi, M. PLS-partial least squares projections to latent structures, 3D QSAR in drug design, Volume 1 (1993), pp. 523-550

[31] Zou, X.; Zhao, J.; Li, Y. Selection of the efficient wavelength regions in FT-NIR spectroscopy for determination of SSC of Fuji apple based on BiPLS and FiPLS models, Vibrational spectroscopy, Volume 44 (2007) no. 2, pp. 220-227