Defining a robust biological prior from Pathway Analysis to drive Network Inference
[Construction d’un a priori biologique robuste à partir de l’analyse de voies métaboliques pour l’inférence de réseaux]
Journal de la société française de statistique, Tome 152 (2011) no. 2, pp. 97-110.

L’inférence de réseaux génétiques à partir de données issues de biopuces est un des défis majeurs de l’ère post-génomique, en partie à cause du grand nombre de réseaux possibles et de la quantité relativement faible de données disponibles. Dans ce contexte, la théorie des modèles graphiques gaussiens est un outil efficace pour la reconstruction de réseaux.

A travers ce travail nous proposons une approche d’inférence de réseaux de régulation à partir d’un a priori biologique robuste sur la structure des réseaux afin de limiter le nombre de candidats possibles.

Les voies métaboliques, qui rendent compte des connaissances biologiques des réseaux de régulation, nous permettent de définir cet a priori. Cette approche est basée sur la sélection d’un ensemble de gènes pertinents, appelé “signature moléculaire”, potentiellement associé à un phénotype d’intérêt (par exemple les gènes impliqués dans le développement d’une pathologie). Dans ce contexte, l’analyse différentielle est la strategie prédominante. Néanmoins les signatures de gènes diffèrent d’une étude à l’autre et la robustesse de telles approches peut être remise en question. Ainsi, la première partie de notre travail consistera en l’amélioration de la stratégie d’identification des gènes les plus informatifs afin de garantir la robustesse et la reproductibilité de la signature moléculaire.

Notre approche vise à comparer les réseaux inférés dans différentes conditions d’étude et à faciliter l’interprétation biologique des résultats. Ainsi, elle permet de mettre en avant des régulations différentielles entre ces conditions.

Nous appliquerons notre méthode à l’étude de la réponse au traitement dans le cancer du sein.

Inferring genetic networks from gene expression data is one of the most challenging work in the post-genomic era, partly due to the vast space of possible networks and the relatively small amount of data available. In this field, Gaussian Graphical Model (GGM) provides a convenient framework for the discovery of biological networks.

In this paper, we propose an original approach for inferring gene regulation networks using a robust biological prior on their structure in order to limit the set of candidate networks.

Pathways, that represent biological knowledge on the regulatory networks, will be used as an informative prior knowledge to drive Network Inference. This approach is based on the selection of a relevant set of genes, called the “molecular signature”, associated with a condition of interest (for instance, the genes involved in disease development). In this context, differential expression analysis is a well established strategy. However outcome signatures are often not consistent and show little overlap between studies. Thus, we will dedicate the first part of our work to the improvement of the standard process of biomarker identification to guarantee the robustness and reproducibility of the molecular signature.

Our approach enables to compare the networks inferred between two conditions of interest (for instance case and control networks) and help along the biological interpretation of results. Thus it allows to identify differential regulations that occur in these conditions. We illustrate the proposed approach by applying our method to a study of breast cancer’s response to treatment.

Keywords: Network Inference, Gaussian Graphical Model, $\ell _1$ penalization, Prior information, Pathway Analysis
Mot clés : Inférence de réseaux, Modèle graphique gaussien, Pénalisation $\ell _1$, Information a priori, Analyse de voies métaboliques
@article{JSFS_2011__152_2_97_0,
     author = {Jeanmougin, Marine and Guedj, Mickael and Ambroise, Christophe},
     title = {Defining a robust biological prior from {Pathway} {Analysis} to drive {Network} {Inference}},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {97--110},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {152},
     number = {2},
     year = {2011},
     mrnumber = {2821224},
     zbl = {1316.92050},
     language = {en},
     url = {http://archive.numdam.org/item/JSFS_2011__152_2_97_0/}
}
TY  - JOUR
AU  - Jeanmougin, Marine
AU  - Guedj, Mickael
AU  - Ambroise, Christophe
TI  - Defining a robust biological prior from Pathway Analysis to drive Network Inference
JO  - Journal de la société française de statistique
PY  - 2011
SP  - 97
EP  - 110
VL  - 152
IS  - 2
PB  - Société française de statistique
UR  - http://archive.numdam.org/item/JSFS_2011__152_2_97_0/
LA  - en
ID  - JSFS_2011__152_2_97_0
ER  - 
%0 Journal Article
%A Jeanmougin, Marine
%A Guedj, Mickael
%A Ambroise, Christophe
%T Defining a robust biological prior from Pathway Analysis to drive Network Inference
%J Journal de la société française de statistique
%D 2011
%P 97-110
%V 152
%N 2
%I Société française de statistique
%U http://archive.numdam.org/item/JSFS_2011__152_2_97_0/
%G en
%F JSFS_2011__152_2_97_0
Jeanmougin, Marine; Guedj, Mickael; Ambroise, Christophe. Defining a robust biological prior from Pathway Analysis to drive Network Inference. Journal de la société française de statistique, Tome 152 (2011) no. 2, pp. 97-110. http://archive.numdam.org/item/JSFS_2011__152_2_97_0/

[1] Ambroise, Christophe; Chiquet, Julien; Matias, Catherine Inferring sparse Gaussian graphical models with latent structure, Electronic Journal of Statistics, Volume 3 (2009), pp. 205-238 | MR | Zbl

[2] Adriaens, Michiel E.; Jaillard, Magali; Waagmeester, Andra; Coort, Susan L.M.; Pico, Alex R.; Evelo, Chris T.A. The public road to high-quality curated biological pathways, Drug Discovery Today, Volume 13 (2008)

[3] Bernard, Allister; Hartemink, Alexander J Informative structure priors: joint learning of dynamic regulatory networks from multiple types of data., Pac Symp Biocomput (2005), pp. 459-470

[4] Breiman, Leo Random Forests, Machine Learning, Volume 45 (2001), pp. 5-32 (10.1023/A:1010933404324) | Zbl

[5] Chiquet, Julien; Grandvalet, Yves; Ambroise, Christophe Inferring Multiple Graph Structures, Statistics and Computing (2010) | MR | Zbl

[6] Coticchia, Christine M; Revankar, Chetana M; Deb, Tushar B; Dickson, Robert B; Johnson, Michael D Calmodulin modulates Akt activity in human breast cancer cell lines., Breast Cancer Res Treat, Volume 115 (2009) no. 3, pp. 545-560 | DOI

[7] de Jong, Hidde Modeling and simulation of genetic regulatory systems: a literature review., J Comput Biol, Volume 9 (2002) no. 1, pp. 67-103 | DOI | MR

[8] Ein-Dor, Liat; Kela, Itai; Getz, Gad; Givol, David; Domany, Eytan Outcome signature genes in breast cancer: is there a unique set?, Bioinformatics, Volume 21 (2005) no. 2, pp. 171-178 | DOI

[9] Friedman, Nir; Linial, Michal; Nachman, Iftach; Pe’er, Dana Using Bayesian networks to analyze expression data., J Comput Biol, Volume 7 (2000) no. 3-4, pp. 601-620 | DOI

[10] Gandhi, T. K B; Zhong, Jun; Mathivanan, Suresh; Karthick, L.; Chandrika, K. N.; Mohan, S. Sujatha; Sharma, Salil; Pinkert, Stefan; Nagaraju, Shilpa; Periaswamy, Balamurugan; Mishra, Goparani; Nandakumar, Kannabiran; Shen, Beiyi; Deshpande, Nandan; Nayak, Rashmi; Sarker, Malabika; Boeke, Jef D; Parmigiani, Giovanni; Schultz, Jörg; Bader, Joel S; Pandey, Akhilesh Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets., Nat Genet, Volume 38 (2006) no. 3, pp. 285-293 | DOI

[11] Hess, Kenneth R; Anderson, Keith; Symmans, W. Fraser; Valero, Vicente; Ibrahim, Nuhad; Mejia, Jaime A; Booser, Daniel; Theriault, Richard L; Buzdar, Aman U; Dempsey, Peter J; Rouzier, Roman; Sneige, Nour; Ross, Jeffrey S; Vidaurre, Tatiana; Gómez, Henry L; Hortobagyi, Gabriel N; Pusztai, Lajos Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer., J Clin Oncol, Volume 24 (2006) no. 26, pp. 4236-4244 | DOI

[12] Jeanmougin, Marine; de Reynies, Aurelien; Marisa, Laetitia; Paccard, Caroline; Nuel, Gregory; Guedj, Mickael Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies, PLoS ONE, Volume 5 (2010) no. 9 | DOI

[13] Kauffman, Stuart A. Metabolic stability and epigenesis in randomly constructed genetic nets, Journal of Theoretical Biology, Volume 22 (1969) no. 3, pp. 437-467 | DOI | MR

[14] Kanehisa, Minoru; Goto, Susumu; Hattori, Masahiro; Aoki-Kinoshita, Kiyoko F; Itoh, Masumi; Kawashima, Shuichi; Katayama, Toshiaki; Araki, Michihiro; Hirakawa, Mika From genomics to chemical genomics: new developments in KEGG., Nucleic Acids Res, Volume 34 (2006) no. Database issue, p. D354-D357 | DOI

[15] Lauritzen, Steffen L. Graphical models, Clarendon Press, 1996 | MR | Zbl

[16] Liang, Shoudan; Fuhrman, Stefanie; Somogyi, Roland Reveal, A General Reverse Engineering Algorithm For Inference Of Genetic Network Architectures, 1998

[17] Lage, Kasper; Karlberg, E. Olof; Størling, Zenia M; Olason, Páll I; Pedersen, Anders G; Rigina, Olga; Hinsby, Anders M; Tümer, Zeynep; Pociot, Flemming; Tommerup, Niels; Moreau, Yves; Brunak, Søren A human phenome-interactome network of protein complexes implicated in genetic disorders., Nat Biotechnol, Volume 25 (2007) no. 3, pp. 309-316 | DOI

[18] Liedtke, Cornelia; Mazouni, Chafika; Hess, Kenneth R; André, Fabrice; Tordai, Attila; Mejia, Jaime A; Symmans, W. Fraser; Gonzalez-Angulo, Ana M; Hennessy, Bryan; Green, Marjorie; Cristofanilli, Massimo; Hortobagyi, Gabriel N; Pusztai, Lajos Response to neoadjuvant therapy and long-term survival in patients with triple-negative breast cancer., J Clin Oncol, Volume 26 (2008) no. 8, pp. 1275-1281 | DOI

[19] Mukherjee, Sach; Speed, Terence P Network inference using informative priors., Proc Natl Acad Sci U S A, Volume 105 (2008) no. 38, pp. 14313-14318 | DOI

[20] Nielsen, Torsten O; Hsu, Forrest D; Jensen, Kristin; Cheang, Maggie; Karaca, Gamze; Hu, Zhiyuan; Hernandez-Boussard, Tina; Livasy, Chad; Cowan, Dave; Dressler, Lynn; Akslen, Lars A; Ragaz, Joseph; Gown, Allen M; Gilks, C. Blake; van de Rijn, Matt; Perou, Charles M Immunohistochemical and clinical characterization of the basal-like subtype of invasive breast carcinoma., Clin Cancer Res, Volume 10 (2004) no. 16, pp. 5367-5374 | DOI

[21] Oti, M.; Brunner, H. G. The modular nature of genetic diseases., Clin Genet, Volume 71 (2007) no. 1, pp. 1-11 | DOI

[22] Oti, M.; Snel, B.; Huynen, M. A.; Brunner, H. G. Predicting disease genes using protein-protein interactions., J Med Genet, Volume 43 (2006) no. 8, pp. 691-698 | DOI

[23] Pearl, Judea Probabilistic reasoning in intelligent systems : networks of plausible inference, Morgan Kaufmann, 1997 http://www.worldcat.org/isbn/1558604790 | MR | Zbl

[24] Perou, Charles M.; Sorlie, Therese; Eisen, Michael B.; van de Rijn, Matt; Jeffrey, Stefanie S.; Rees, Christian A.; Pollack, Jonathan R.; Ross, Douglas T.; Johnsen, Hilde; Akslen, Lars A.; Fluge, Oystein; Williams, Alexander Pergamenschikovand Cheryl; Zhu, Shirley X.; Lonning, Per E.; Borresen-Dale, Anne-Lise; Brown, Patrick O.; Botstein, David Molecular portraits of human breast tumours., Nature, Volume 406 (2000) no. 6797, pp. 747-752 | DOI

[25] Remy, Elisabeth; Ruet, Paul From minimal signed circuits to the dynamics of Boolean regulatory networks, Bioinformatics, Volume 24 (2008) no. 16, p. i220-i226 | DOI

[26] Snel, B.; Lehmann, G.; Bork, P.; Huynen, M. A. STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene., Nucleic Acids Res, Volume 28 (2000) no. 18, pp. 3442-3444

[27] Smyth, Gordon K. Linear models and empirical bayes methods for assessing differential expression in microarray experiments., Statistical applications in genetics and molecular biology, Volume 3 (2004) no. 1 | DOI | MR | Zbl

[28] Sotiriou, Christos; Neo, Soek-Ying; McShane, Lisa M.; Korn, Edward L.; Long, Philip M.; Jazaeri, Amir; Martiat, Philippe; Fox, Steve B.; Harris, Adrian L.; Liu, Edison T. Breast cancer classification and prognosis based on gene expression profiles from a population-based study, Proceedings of the National Academy of Sciences of the United States of America, Volume 100 (2003) no. 18, pp. 10393-10398 | DOI

[29] Sorlie, Therese; Tibshirani, Robert; Parker, Joel; Hastie, Trevor; Marron, J. S.; Nobel, Andrew; Deng, Shibing; Johnsen, Hilde; Pesich, Robert; Geisler, Stephanie; Demeter, Janos; Perou, Charles M; Lønning, Per E; Brown, Patrick O; Børresen-Dale, Anne-Lise; Botstein, David Repeated observation of breast tumor subtypes in independent gene expression data sets., Proc Natl Acad Sci U S A, Volume 100 (2003) no. 14, pp. 8418-8423 | DOI

[30] Toh, Hiroyuki; Horimoto, Katsuhisa Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling, Bioinformatics, Volume 18 (2002), pp. 287-297

[31] Thomas, René Boolean formalization of genetic control circuits, Journal of Theoretical Biology, Volume 42 (1973) no. 3, pp. 563 -585 | DOI

[32] Vert, Jean-Philippe; Yamanishi, Yoshihiro Supervised graph inference, Advances in Neural Information Processing Systems (2005), pp. 1433-1440 http://eprints.pascal-network.org/archive/00001405/

[33] Werhli, Adriano V.; Grzegorczyk, Marco; Husmeier, Dirk Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks, Bioinformatics, Volume 22 (2006) no. 20, pp. 2523-2531 | DOI

[34] Whittaker, Joe Graphical Models in Applied Multivariate Statistics (Wiley Series in Probability & Statistics), John Wiley & Sons, 1990 http://www.worldcat.org/isbn/0471917508 | MR | Zbl

[35] Yamanishi, Yoshihiro; Vert, Jean-Philippe.; Kanehisa, Minoru Protein network inference from multiple genomic data: a supervised approach., Bioinformatics, Volume 20 Suppl 1 (2004), p. i363-i370 | DOI