A review of statistical models for clustering networks with an application to a PPI network
[Modèles Statistiques pour la classification non-supervisée des sommets d’un graphe et application à un réseau d’interactions de protéines]
Journal de la société française de statistique, Tome 152 (2011) no. 2, pp. 111-125.

La classification non-supervisée des noeuds d’un graphe donne des éléments essentiels sur l’architecture d’un réseau. Il existe des différences d’approche entre les communautés scientifiques (informaticiens, physiciens et statisticiens) qui se sont attaqués à cette question. Nous présentons ici les travaux récents de la communauté des statisticiens, basés sur des modèles de graphes aléatoires hétérogènes et nous analysons un grand réseau d’interactions de protéines avec un modèle de ce type.

Clustering the nodes of a graph allows to analyze the topology of a network. At least three scientific communities (Computer science, Physics and Statistics) proposed some methods to go ahead. We give here an overview about the last developments about heterogeneous random graph models proposed by the statisticians. The Stochastic Block Model is applied to analyze a large Protein-Protein Interaction network

Keywords: Biological Networks, Clustering, Random Graph, Mixture Model, Variational Estimation
Mot clés : Classification non supervisée, Estimation Variationnelle, Modèle de Mélange, Graphes aléatoires, Réseaux Biologiques
@article{JSFS_2011__152_2_111_0,
     author = {Daudin, Jean-Jacques},
     title = {A review of statistical models for clustering networks with an application to a {PPI} network},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {111--125},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {152},
     number = {2},
     year = {2011},
     mrnumber = {2821225},
     zbl = {1316.62085},
     language = {en},
     url = {http://archive.numdam.org/item/JSFS_2011__152_2_111_0/}
}
TY  - JOUR
AU  - Daudin, Jean-Jacques
TI  - A review of statistical models for clustering networks with an application to a PPI network
JO  - Journal de la société française de statistique
PY  - 2011
SP  - 111
EP  - 125
VL  - 152
IS  - 2
PB  - Société française de statistique
UR  - http://archive.numdam.org/item/JSFS_2011__152_2_111_0/
LA  - en
ID  - JSFS_2011__152_2_111_0
ER  - 
%0 Journal Article
%A Daudin, Jean-Jacques
%T A review of statistical models for clustering networks with an application to a PPI network
%J Journal de la société française de statistique
%D 2011
%P 111-125
%V 152
%N 2
%I Société française de statistique
%U http://archive.numdam.org/item/JSFS_2011__152_2_111_0/
%G en
%F JSFS_2011__152_2_111_0
Daudin, Jean-Jacques. A review of statistical models for clustering networks with an application to a PPI network. Journal de la société française de statistique, Tome 152 (2011) no. 2, pp. 111-125. http://archive.numdam.org/item/JSFS_2011__152_2_111_0/

[1] Airoldi, EM.; Blei, DM.; Fienberg, S.; Xing, EP. Mixed Membership Stochastic Blockmodels, Journal of Machine Learning Research, Volume 9 (2008), pp. 1981-2014 | Zbl

[2] Ambroise, C.; Matias, C. New consistent and asymptotically normal estimators for random graph mixture models, http://arxiv.org/abs/1003.5165 (2010) | MR

[3] Allman, E.S.; Matias, C.; Rhodes, J.A. Identifiability of parameters in latent structure models with many observed variables, Annals of Statistics, Volume 37 (2010), pp. 3099-3132 | MR | Zbl

[4] Allman, E.S.; Matias, C.; Rhodes, J.A. Parameter identifiability in a class of random graph mixture models, Journal of Statistical Planning and Inference, Volume doi:10.1016/j.jspi.2010.11.022 (2010) | MR | Zbl

[5] Bickel, P.J.; Chen, A. A nonparametric view of network models and Newman-Girvan and other modularities, PNAS (2010), pp. 1-6

[6] Celisse, A.; Daudin, J.J.; Pierre, L. Consistency of maximum likelihood and variational estimators in mixture models for random graphs (2011) http://hal.archives−ouvertes.fr/hal−00593644/fr/ | MR

[7] Choi, D.S.; Wolfe, P.J.; Airoldi, E.M. Stochastic blockmodels with growing number of classes, http://arxiv.org/abs/1011.4644/ (2010) | MR

[8] Daudin, J.-J.; Picard, F.; Robin, S. A mixture model for random graphs, Stat Comput, Volume 18 (2008), pp. 173-183 | MR

[9] Daudin, J.-J.; Pierre, L.; Vacher, C. Model for Heterogeneous Random Networks Using Continuous Latent Variables and an Application to a Tree-Fungus Network, Biometrics, Volume 66(4) (2010), p. 1043-51 | MR | Zbl

[10] Erosheva, E. Comparing latent structures of the grade of membership, Rasch and latent class model, Psychometrika, Volume 70(4) (2005), pp. 619-628 | MR | Zbl

[11] Ewing, R. Large-scale mapping of human protein-protein interactions by mass spectrometry, Mol. Syst. Biol., Volume 3(89) (2007), pp. 1-17

[12] Gazal, S.; Daudin, J.-J.; Robin, S. Accuracy of variational estimates for some random graph models, CSDA, Volume to appear (2011)

[13] Girvan, M.; Newman, M. E. J. Community structure in social and biological networks, PNAS, Volume 99 (2002), pp. 7821-7826 | Zbl

[14] Harshman, RA. Model for analysis of asymmetrical relationships among N objects or stimuli, First Joint Meeting of the Psychometric Society and the Society of Mathematical Psychology (1978)

[15] Handcock, MS.; Raftery, AE.; Tantrum, JM. Model-based clustering for social networks, JRSSA, Volume 54 (2007), pp. 301-354

[16] Kiers, HAL.; ten Berge, JMF.; Takane, Y.; De Leeuw, J. A generalization of Takane’s algorithm for DEDICOM, Psychometrika (2002), pp. 151-158 | Zbl

[17] Latouche, P.; Birmele, E.; Ambroise, C. Bayesian Methods for Graph Clustering, Advances in Data Analysis, Data Handling and Business Intelligence, Springer (2009)

[18] Latouche, P.; Birmele, E.; Ambroise, C. Overlapping Stochastic Block Models, arXiv:0910.2098v2 [stat.ME] (2010)

[19] Marchette, DJ.; Priebe, CE. Predicting unobserved links in incompletely observed networks, CSDA, Volume 52 (2008), pp. 1373-1386 | Zbl

[20] Mariadassou, M.; Robin, S.; Vacher, C. Uncovering structure in valued graphs: a variational approach., Ann. Appl. Statist., Volume 4(2) (2010), p. 715-42 | Zbl

[21] Marras, E.; Travaglioney, A.; Capobiancoz, E. Sub-Modular Resolution Analysis by Network Mixture Models, Statistical Applications in Genetics and Molecular Biology, Volume 9 (2010) | Zbl

[22] Manton, KG.; Woodbury, MA.; Tolley, HD. Statistical Applications Using Fuzzy Sets (Interscience, Wiley, ed.), 1994, pp. 331-336 | Zbl

[23] Nowicki, K.; Snijders, T. Estimation and prediction for stochastic block-structures, J. Am. Stat. Assoc., Volume 96 (2001), pp. 1077-1087 | Zbl

[24] Picard, F.; Miele, V.; Daudin, J.-J.; Cottret, L.; Robin, S. Deciphering the connectivity structure of biological networks using MixNet, BMC Bioinformatics, Volume 10 (2009)

[25] Rohe, K.; Chatterjee, S.; Yu, Bin Spectral Clustering and the high-dimensional SBM, Technical report Berkeley 791, Volume http://www.stat.berkeley.edu/25 (2010)

[26] Snijders, T. A. B.; Nowicki, K. Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure, Journal of Classification, Volume 14 (1997), pp. 75-100 | Zbl

[27] Trendafilov, NT. GIPSCAL revisited. A projected gradient approach., Statistics and Computing, Volume 12 (2002), pp. 135-145

[28] van Dongen, S. Graph Clustering Via a Discrete Uncoupling Process, SIAM Journal on Matrix Analysis and Applications, Volume 30(1) (2008), pp. 121-141 | Zbl

[29] van der Vaart, A. W.; Wellner, J. A. Weak Convergence and Empirical Processes With Applications to Statistics, Springer Series in Statistics, 1996 | Zbl

[30] Von Luxburg, U. A tutorial on spectral clustering, Statistics and Computing, Volume 17 (2007), pp. 395-416

[31] Zanghi, H.; Picard, F.; Miele, V.; Ambroise, C. Strategies for Online Inference of Network Mixture, Annals of Applied Statistics, Volume to appear (2010)