The problem of assessing the reliability of clusters patients identified by clustering algorithms is crucial to estimate the significance of subclasses of diseases detectable at bio-molecular level, and more in general to support bio-medical discovery of patterns in gene expression data. In this paper we present an experimental analysis of the reliability of clusters discovered in lung tumor patients using DNA microarray data. In particular we investigate if subclasses of lung adenocarcinoma can be detected with high reliability at bio-molecular level. To this end we apply cluster validity measures based on random projections recently proposed by Bertoni and coworkers. The results show that at least two subclasses of lung adenocarcinoma can be detected with relatively high reliability, confirming and extending previous findings reported in the literature.
Mots clés : cluster validity, clustering algorithms, bio-molecular taxonomy of tumors, DNA microarray data analysis
@article{ITA_2006__40_2_163_0, author = {Valentini, Giorgio and Ruffino, Francesca}, title = {Characterization of lung tumor subtypes through gene expression cluster validity assessment}, journal = {RAIRO - Theoretical Informatics and Applications - Informatique Th\'eorique et Applications}, pages = {163--176}, publisher = {EDP-Sciences}, volume = {40}, number = {2}, year = {2006}, doi = {10.1051/ita:2006011}, mrnumber = {2252634}, zbl = {1108.62122}, language = {en}, url = {http://archive.numdam.org/articles/10.1051/ita:2006011/} }
TY - JOUR AU - Valentini, Giorgio AU - Ruffino, Francesca TI - Characterization of lung tumor subtypes through gene expression cluster validity assessment JO - RAIRO - Theoretical Informatics and Applications - Informatique Théorique et Applications PY - 2006 SP - 163 EP - 176 VL - 40 IS - 2 PB - EDP-Sciences UR - http://archive.numdam.org/articles/10.1051/ita:2006011/ DO - 10.1051/ita:2006011 LA - en ID - ITA_2006__40_2_163_0 ER -
%0 Journal Article %A Valentini, Giorgio %A Ruffino, Francesca %T Characterization of lung tumor subtypes through gene expression cluster validity assessment %J RAIRO - Theoretical Informatics and Applications - Informatique Théorique et Applications %D 2006 %P 163-176 %V 40 %N 2 %I EDP-Sciences %U http://archive.numdam.org/articles/10.1051/ita:2006011/ %R 10.1051/ita:2006011 %G en %F ITA_2006__40_2_163_0
Valentini, Giorgio; Ruffino, Francesca. Characterization of lung tumor subtypes through gene expression cluster validity assessment. RAIRO - Theoretical Informatics and Applications - Informatique Théorique et Applications, Tome 40 (2006) no. 2, pp. 163-176. doi : 10.1051/ita:2006011. http://archive.numdam.org/articles/10.1051/ita:2006011/
[1] Towards a novel classification of human malignancies based on gene expression. J. Pathol. 195 (2001) 41-52.
, , and ,[2] R Anbazhagan et al., Classification of small cell lung cancer and pulmonary carcinoid by gene expression profiles. Cancer Research 59 (1999) 5119-5122.
[3] A cluster validity framework for genome expression data. Bioinformatics 18 (2002) 319-320.
,[4] Assessment of clusters reliability for high dimensional genomic data, in BITS 2005, Bioinformatics Italian Society Meeting, Milano Italy (2005).
, , and ,[5] Random projections for assessing gene expression cluster stability, in IJCNN 2005, The IEEE-INNS International Joint Conference on Neural Networks, Montreal (2005).
and ,[6] Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses. Artif. Intell. Med. (in press)
and ,[7] Some new indexes of cluster validity. IEEE Trans. Systems, Man and Cybernetics Part B 28 (1998) 301-315.
and ,[8] Classification of human lung carcinoma by mRNA expression profiling reveals distinct adenocarcinoma subclasses. PNAS 98 (2001) 13790-13795.
, , , , , , , , , , , , , , , , , and ,[9] An integrated tool for microarray data clustering and cluster validity assessment. Bioinformatics 21 (2005) 451-455.
, and ,[10] Clinical features of patients with stage iiib and iv bronchioloalveolar carcinoma of the lung. Cancer 86 (1999) 1165-1173.
et al.,[11] Bayesian classification (autoclass): Theory and results, in Advances in Knowledge Discovery and Data Mining, edited by U. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurasamy, MIT Press, Cambridge, MA 2 (1996) 153-180.
and ,[12] Analysis of variance components in gene expression data. Bioinformatics 20 (2004) 1436-1446.
, , , , , , and ,[13] A cluster separation measure. IEEE Transactions on Pattern Recognition and Machine Intelligence 1 (1979) 224-227.
and ,[14] A prediction-based method for estimating the number of clusters in a dataset. Genome Biology 3 (2002) 1-21.
and ,[15] Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19 (2003) 1090-1099.
and ,[16] Well separated clusters and optimal fuzzy partitions. J. Cybernetics 4 (1974) 95-104. | Zbl
,[17] Diversity of gene expression in adenocarcinoma of the lung. PNAS 98 (2001) 13784-13789.
et al.,[18] A k-means clustering algorithm. Appl. Stat. 28 (1979) 100-108. | Zbl
and ,[19] The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (1998) 832-844.
,[20] Data Clustering: a Review. ACM Computing Surveys 31 (1999) 264-323.
, and ,[21] Extensions of Lipshitz mapping into Hilbert space, in Conference in modern analysis and probability, Contemporary Mathematics. Amer. Math. Soc. 26 (1984) 189-206. | Zbl
and ,[22] Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990). | MR | Zbl
and ,[23] Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. PNAS 98 (2001) 8961-8965. | Zbl
and ,[24] Step-wise clustering procedures. J. Am. Stat. Assoc. 69 (1967) 86-101.
,[25] Method for assessing reproducibility of clustering patterns observed in analyses of microarray data. Bioinformatics 18 (2002) 1462-1469.
, , , , and ,[26] Consensus Clustering: A Resampling-based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Machine Learning 52 (2003) 91-118. | Zbl
, , and ,[27] Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comp. App. Math. 20 (1987) 53-65. | Zbl
,[28] Cluster stability scores for microarray data in cancer studies. BMC Bioinformatics 36 (2003).
and ,[29] Interobserver variability in histopahologic subtyping and grading of pulmonary adenocarcinoma. Cancer 71 (1993) 2971-2976.
, , and ,[30] Clusterv: a tool for assessing the reliability of clusters discovered in DNA microarray data. Bioinformatics 22 (2006) 369-370.
,[31] Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58 (1963) 236-244.
,Cité par Sources :