Characterizing the genomic copy number alterations (CNA) in cancer is of major importance in order to develop personalized medicine. Single nucleotide polymorphism (SNP) arrays are still in use to measure CNA profiles. Among the methods for SNP-array analysis, the Genome Alteration Print (GAP) by Popova et al, based on a preliminary segmentation of SNP-array profiles, uses a deterministic approach to infer the absolute copy numbers profile. We develop a probabilistic model for GAP and define a Gaussian mixture model where centers are constrained to belong to a frame depending on unknown parameters such as the proportion of normal tissue. The estimation is performed using an expectation-maximization (EM) algorithm to recover the parameters characterizing the genomic alterations as well as the most probable copy number change of each segment and the unknown proportion of normal tissue. We claim to deduce the tumor ploidy from penalized model selection criterion. Our model is tested on simulated and real data.
La caractérisation des altérations du nombre de copies dans le génome est d’importance capitale pour développer une médecine personnalisée en cancérologie. Les puces à SNPs (Single Nucleotide Polymorphism), une variante de puce à ADN, sont toujours utilisées pour mesurer les profils d’altération du nombre de copies. Parmi les méthodes d’analyse de profil de SNPs, la méthode GAP (Genome Alteration Print) de Popova et al, basée sur une segmentation préliminaire de profils issus de puces SNPs, utilise une approche déterministe pour déterminer le profil du nombre absolu de copies. Nous développons un modèle probabiliste pour la méthode GAP et définissons un modèle de mélange gaussien dont les centres sont contraints d’appartenir à un réseau dépendant de paramètres inconnus tels que la proportion de tissu tumoral dans le prélèvement. L’estimation est effectuée à l’aide d’un algorithme EM (expectation-maximization) permettant d’accéder non seulement aux paramètres mais aussi au nombre altéré de copies le plus probable sur chaque segment ainsi que la proportion tumorale inconnue. Nous proposons de déduire la ploïdie tumorale en utilisant un critère pénalisé de choix de modèle. Notre modèle est testé sur des données simulées et appliqué à un exemple de données de cancer du côlon.
Mot clés : mixture model, EM algorithm, BIC criterion, slope heuristics, cancer, GAP method, SNP-array
@article{JSFS_2019__160_1_130_0, author = {Keribin, Christine and Liu, Yi and Popova, Tatiana and Rozenholc, Yves}, title = {A mixture model to characterize genomic alterations of tumors}, journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique}, pages = {130--148}, publisher = {Soci\'et\'e fran\c{c}aise de statistique}, volume = {160}, number = {1}, year = {2019}, mrnumber = {3928543}, zbl = {1417.62316}, language = {en}, url = {http://archive.numdam.org/item/JSFS_2019__160_1_130_0/} }
TY - JOUR AU - Keribin, Christine AU - Liu, Yi AU - Popova, Tatiana AU - Rozenholc, Yves TI - A mixture model to characterize genomic alterations of tumors JO - Journal de la société française de statistique PY - 2019 SP - 130 EP - 148 VL - 160 IS - 1 PB - Société française de statistique UR - http://archive.numdam.org/item/JSFS_2019__160_1_130_0/ LA - en ID - JSFS_2019__160_1_130_0 ER -
%0 Journal Article %A Keribin, Christine %A Liu, Yi %A Popova, Tatiana %A Rozenholc, Yves %T A mixture model to characterize genomic alterations of tumors %J Journal de la société française de statistique %D 2019 %P 130-148 %V 160 %N 1 %I Société française de statistique %U http://archive.numdam.org/item/JSFS_2019__160_1_130_0/ %G en %F JSFS_2019__160_1_130_0
Keribin, Christine; Liu, Yi; Popova, Tatiana; Rozenholc, Yves. A mixture model to characterize genomic alterations of tumors. Journal de la société française de statistique, Numéro spécial : analyse de mélanges, Tome 160 (2019) no. 1, pp. 130-148. http://archive.numdam.org/item/JSFS_2019__160_1_130_0/
[1] Minimal penalties and the slope heuristics: a survey, arXiv preprint arXiv:1901.07277 (2019)
[2] Sélection de modèle pour la classification non supervisée. Choix du nombre de classes., Université Paris Sud-Paris XI (2009) (Ph. D. Thesis)
[3] Gaussian model selection, Journal of the European Mathematical Society, Volume 3 (2001) no. 3, pp. 203-268 | MR | Zbl
[4] Minimal penalties for Gaussian model selection, Probability theory and related fields, Volume 138 (2007) no. 1-2, pp. 33-73 | MR | Zbl
[5] Slope heuristics: overview and implementation, Statistics and Computing, Volume 22 (2012) no. 2, pp. 455-470 | MR | Zbl
[6] Phylogeographic genomics of mitochondrial DNA: highly-resolved patterns of intraspecific evolution and a multi-species, microarray-based DNA sequencing strategy for biodiversity studies, Comparative Biochemistry and Physiology Part D: Genomics and Proteomics, Volume 3 (2008) no. 1, pp. 1-11
[7] A new algorithm for fixed design regression and denoising, Annals of the Institute of Statistical Mathematics, Volume 56 (2004) no. 3, pp. 449-473 | MR | Zbl
[8] Maximum likelihood from incomplete data via the EM algorithm, Journal of the royal statistical society. Series B (methodological) (1977), pp. 1-38 | MR | Zbl
[9] Consistent estimation of the order of mixture models, Sankhyā: The Indian Journal of Statistics, Series A (2000), pp. 49-66 | MR | Zbl
[10] GPHMM: an integrated hidden Markov model for identification of copy number alteration and loss of heterozygosity in complex tumor samples using whole genome SNP arrays, Nucleic Acids Research, Volume 39 (2011) no. 12, pp. 4928-4941 | DOI
[11] Comparison of methods to detect copy number alterations in cancer using simulated and real genotyping data, BMC Bioinformatics, Volume 13 (2012) no. 1 | DOI
[12] Genome Alteration Print (GAP): a tool to visualize and mine complex cancer genomic profiles obtained by SNP arrays, Genome Biol, Volume 10 (2009) no. 11, p. R128-R128
[13] High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays, Nature genetics, Volume 20 (1998) no. 2, pp. 207-211 | DOI
[14] Segmentation-based detection of allelic imbalance and loss-of-heterozygosity in cancer cells using whole genome SNP arrays, Genome Biology, Volume 9 (2008) no. 9 http://genomebiology.com/2008/9/9/R136 | DOI
[15] Matrix-based comparative genomic hybridization: biochips to screen for genomic imbalances, Genes, chromosomes and cancer, Volume 20 (1997) no. 4, pp. 399-407
[16] Integrated study of copy number states and genotype calls using high-density SNP arrays, Nucleic acids research, Volume 37 (2009) no. 16, pp. 5365-5377
[17] Allele-specific copy number analysis of tumors, Proceedings of the National Academy of Sciences, Volume 107 (2010) no. 39, pp. 16910-16915
[18] A statistical approach for detecting genomic aberrations in heterogeneous tumor samples from single nucleotide polymorphism genotyping data, Genome Biol, Volume 11 (2010) no. 9, p. R92-R92