Gene selection via BPSO and Backward generation for cancer classification
RAIRO - Operations Research - Recherche Opérationnelle, Tome 53 (2019) no. 1, pp. 269-288.

Gene expression data (DNA microarray) enable researchers to simultaneously measure the levels of expression of several thousand genes. These levels of expression are very important in the classification of different types of tumors. In this work, we are interested in gene selection, which is an essential step in the data pre-processing for cancer classification. This selection makes it possible to represent a small subset of genes from a large set, and to eliminate the redundant, irrelevant or noisy genes. The combinatorial nature of the selection problem requires the development of specific techniques such as filters and Wrappers, or hybrids combining several optimization processes. In this context, we propose two hybrid approaches (RBPSO-1NN and FBPSO-SVM) for the gene selection problem, based on the combination of the filter methods (the Fisher criterion and the ReliefF algorithm), the BPSO metaheuristic algorithms and the Backward algorithm using the classifiers (SVM and 1NN) for the evaluation of the relevance of the candidate subsets. In order to verify the performance of our methods, we have tested them on eight well-known microarray datasets of high dimensions varying from 2308 to 11225 genes. The experiments carried out on the different datasets show that our methods prove to be very competitive with the existing works.

Reçu le :
Accepté le :
DOI : 10.1051/ro/2018059
Classification : 62F07, 62H30, 62P10, 68T20, 90C06
Mots-clés : Gene selection, cancer classification, BPSO, backward generation, SVM, 1NN, ReliefF, Fisher criterion, DNA microarray
Bir-Jmel, Ahmed 1 ; Mohamed Douiri, Sidi 1 ; Elbernoussi, Souad 1

1
@article{RO_2019__53_1_269_0,
     author = {Bir-Jmel, Ahmed and Mohamed Douiri, Sidi and Elbernoussi, Souad},
     title = {Gene selection via {BPSO} and {Backward} generation for cancer classification},
     journal = {RAIRO - Operations Research - Recherche Op\'erationnelle},
     pages = {269--288},
     publisher = {EDP-Sciences},
     volume = {53},
     number = {1},
     year = {2019},
     doi = {10.1051/ro/2018059},
     zbl = {1418.62388},
     mrnumber = {3912473},
     language = {en},
     url = {http://archive.numdam.org/articles/10.1051/ro/2018059/}
}
TY  - JOUR
AU  - Bir-Jmel, Ahmed
AU  - Mohamed Douiri, Sidi
AU  - Elbernoussi, Souad
TI  - Gene selection via BPSO and Backward generation for cancer classification
JO  - RAIRO - Operations Research - Recherche Opérationnelle
PY  - 2019
SP  - 269
EP  - 288
VL  - 53
IS  - 1
PB  - EDP-Sciences
UR  - http://archive.numdam.org/articles/10.1051/ro/2018059/
DO  - 10.1051/ro/2018059
LA  - en
ID  - RO_2019__53_1_269_0
ER  - 
%0 Journal Article
%A Bir-Jmel, Ahmed
%A Mohamed Douiri, Sidi
%A Elbernoussi, Souad
%T Gene selection via BPSO and Backward generation for cancer classification
%J RAIRO - Operations Research - Recherche Opérationnelle
%D 2019
%P 269-288
%V 53
%N 1
%I EDP-Sciences
%U http://archive.numdam.org/articles/10.1051/ro/2018059/
%R 10.1051/ro/2018059
%G en
%F RO_2019__53_1_269_0
Bir-Jmel, Ahmed; Mohamed Douiri, Sidi; Elbernoussi, Souad. Gene selection via BPSO and Backward generation for cancer classification. RAIRO - Operations Research - Recherche Opérationnelle, Tome 53 (2019) no. 1, pp. 269-288. doi : 10.1051/ro/2018059. http://archive.numdam.org/articles/10.1051/ro/2018059/

[1] S. Agarwal, R. Rajesh and P. Ranjan, FRBPSO: a Fuzzy rule based binary PSO for feature selection. Proc. Nat. Acad. Sci. India Sec. A: Phys. Sci. 87 (2017) 221–233.

[2] E. Alba, J. Garcia-Nieto, L. Jourdan and E.G. Talbi, Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms. In: IEEE Congress on Evolutionary Computation, 2007. CEC 2007. IEEE (2007, September) 284–290. | DOI

[3] A.A. Alizadeh, M.B. Eisen, R.E. Davis, C. Ma, I.S. Lossos, A. Osenwald, et al., Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403 (2000) 503. | DOI

[4] E. Amaldi and V. Kann, On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theor. Comput. Sci. 209 (1998) 237–260. | DOI | MR | Zbl

[5] J. Apolloni, G. Leguizamón and E. Alba, Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments. Appl. Soft Comput. 38 (2016) 922–932. | DOI

[6] K.H. Chen, K.J. Wang, K.M. Wang and M.A. Angelia, Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data. Appl. Soft Comput. 24 (2014) 773–780. | DOI

[7] Y.M. Chiang, H.M. Chiang and S.Y. Lin, The application of ant colony optimization for gene selection in microarray-based cancer classification. In: International Conference on Machine Learning and Cybernetics, 2008. IEEE (2008) 4001–4006. | DOI

[8] L.Y. Chuang, H.W. Chang, C.J. Tu and C.H. Yang, Improved binary PSO for feature selection using gene expression data. Comput. Biol. Chem. 32 (2008) 29–38. | DOI | Zbl

[9] L.Y. Chuang, C.H. Yang and C.H. Yang, Tabu search and binary particle swarm optimization for feature selection using microarray data. J. Comput. Biol. 16 (2009) 1689–1703. | DOI | MR

[10] C. Cortes and V. Vapnik, Support-vector networks. Mach. Learn. 20 (1995) 273–297. | DOI | Zbl

[11] T. Cover and P. Hart, Nearest neighbor pattern classification. IEEE Trans. Info. Theory 13 (1967) 21–27. | DOI | Zbl

[12] M. Dashtban, M. Balafar and P. Suravajhala, Gene selection for tumor classification using a novel bio-inspired multi-objective approach. Genomics 110 (2018) 10–17. | DOI

[13] E. Fix and J.L. Hodges Jr, Discriminatory Analysis-Nonparametric Discrimination: Consistency Properties. California Univ Berkeley, Berkeley (1951). | Zbl

[14] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard and M. Gaasenbeek, et al., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286 (1999) 531–537. | DOI

[15] Y. Guermeur, SVM multiclasses, théorie et applications. Habilitation à diriger des recherches. UHP (2007).

[16] Q. Gu, Z. Li and J. HanGeneralized fisher score for feature selection. Preprint arXiv: 1202.3725 (2012).

[17] C.W. Hsu, C.C. Chang and C.J. Lin, A practical guide to support vector classification. Available at: http://www.csie.ntu.edu.tw/ cjlin/ papers/guide/guide.pdf (2003).

[18] H.Y. Huang and C.J. Lin, Linear and kernel classification: when to use which? In: Proc. of the 2016 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics (2016) 216–224. | DOI

[19] P. Jafari and F. Azuaje, An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors. BMC Med. Info. Decis. Mak. 6 (2006) 27. | DOI

[20] J. Kennedy and R. Eberhart, PSO optimization. In: Proc. IEEE Int. Conf. Neural Networks. IEEE Service Center, Piscataway, NJ 4 (1995) 1941–1948.

[21] J. Kennedy and R.C. Eberhart, A discrete binary version of the particle swarm algorithm. In: Systems, Man, and Cybernetics, 1997. IEEE International Conference on Computational Cybernetics and Simulation. IEEE 5 (1997) 4104–4108. | DOI

[22] K. Kira and L.A. Rendell, A practical approach to feature selection. In: Proc. of the Ninth International Workshop on Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1992) 249–256.

[23] R. Kohavi and G.H. John, Wrappers for features subset selection. Artif. Intell. 97 (1997) 273–324. | DOI | Zbl

[24] I. Kononenko, Estimating attributes: analysis and extensions of RELIEFIn: European Conference on Machine Learning. Springer, Berlin, Heidelberg (1994) 171–182.

[25] B. Kumari and T. Swarnkar, Filter versus wrapper feature subset selection in large dimensionality micro array: a review. Int. J. Comput. Sci. Inf. Technol. 2 (2011) 1048–1053.

[26] C.M. Lai, W.C. Yeh and C.Y. Chang, Gene selection using information gain and improved simplified swarm optimization. Neurocomputing 218 (2016) 331–338. | DOI

[27] C.P. Lee and Y. Leu, A novel hybrid feature selection method for microarray data analysis. Appl. Soft Comput. 11 (2011) 208–213. | DOI

[28] Y. Li, G. Wang, H. Chen, L. Shi and L. Qin, An ant colony optimization based dimension reduction method for high-dimensional datasets. J. Bionic Eng. 10 (2013) 231–241. | DOI

[29] S. Li, X. Wu and M. Tan, Gene selection using hybrid particle swarm optimization and genetic algorithm. Soft Comput. 12 (2008) 1039–1048. | DOI

[30] H. Liu and H. Motoda, Feature selection for knowledge discovery and data mining. In Vol. 454. Springer Science Business Media (2012). | Zbl

[31] D. Mishra and B. Sahu, Feature selection for cancer classification: a signal-to-noise ratio approach. Int. J. Sci. Eng. Res. 2 (2011) 1–7.

[32] M.S. Mohamad, S. Omatu, S. Deris, M. Yoshioka, A. Abdullah and Z. Ibrahim, An enhancement of binary particle swarm optimization for gene selection in classifying cancer classes. Algorithm Mol. Biol. 8 (2013) 15. | DOI

[33] S.K. Pati, A.K. Das, A. Ghosh, Gene selection using multi-objective genetic algorithm integrating cellular automata and rough set theory. In: International Conference on Swarm, Evolutionary, and Memetic Computing. Springer, Cham (2013) 144–155. | DOI | MR

[34] A.C. Pease, D. Solas, E.J. Sullivan, M.T. Cronin, C.P. Holmes and S.P. Fodor, Light-generated oligonucleotide arrays for rapid DNA sequence analysis. Proc. Nat. Acad. Sci. 91 (1994) 5022–5026. | DOI

[35] J.C. Platt, N. Cristianini and J. Shawe-Taylor, Large margin DAGs for multiclass classification. In: Proc. of Advances in neural information processing systems (2000) 547–553.

[36] F.V. Sharbaf, S. Mosafer and M.H. Moattar, A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization. Genomics 107 (2016) 231–238. | DOI

[37] S.S. Shreem, S. Abdullah, M.Z.A. Nazri and M. Alzaqebah, Hybridizing ReliefF, MRMR filters and GA wrapper approaches for gene selection. J. Theor. Appl. Inf. Technol. 46 (2012) 1034–1039.

[38] A. Statnikov, C. Aliferis and I. Tsamardinos, Gems: Gene Expression Model Selector. Available at: http://www.gems-system.org (2005).

[39] S. Tabakhi, A. Najafi, R. Ranjbar and P. Moradi, Gene selection for microarray data classification using a novel ant colony optimization. Neurocomputing 168 (2015) 1024–1036. | DOI

[40] Z. Wang, Neuro-fuzzy modeling for microarray cancer gene expression data. First year transfer report. University of Oxford (2005).

[41] S. Wang, W. Kong, W. Zeng and X. Hong, Hybrid binary imperialist competition algorithm and tabu search approach for feature selection using gene expression data. BioMed Res. Int. 2016 (2016) 9721713.

[42] X. Wu, V. Kumar, J.R. Quinlan, J. Ghosh, Q. Yang, et al., Top 10 algorithms in data mining. Knowl. Info. Syst. 14 (2008) 1–37. | DOI

[43] G.X. Yuan, C.H. Ho and C.J. Lin, Recent advances of large-scale linear classification. Proc. IEEE 100 (2012) 2584–2603. | DOI

[44] H. Yu, G. Gu, H. Liu, J. Shen and J. Zhao, A modified ant colony optimization algorithm for tumor marker gene selection. Genomics Proteomics Bioinf. 7 (2009) 200–208. | DOI

[45] W. Zhao, G. Wang, H.B. Wang, H.L. Chen, H. Dong and Z.D. Zhao, A novel framework for gene selection. Int. J. Adv. Comput. Technol. 3 (2011) 184–191.

[46] A. Zibakhsh and M.S. Abadeh, Gene selection for cancer tumor detection using a novel memetic algorithm with a multi-view fitness function. Eng. App. Artif. Intell. 26 (2013) 1274–1281. | DOI

Cité par Sources :