Représentation d’un grand réseau à partir d’une classification hiérarchique de ses sommets
Journal de la société française de statistique, Tome 152 (2011) no. 3, pp. 34-65.

Les graphes (ou réseaux) sont devenus des outils courants de modélisation des données relationnelles dans de nombreuses applications (réseaux sociaux, biologiques, informatiques...). Or, lorsque le nombre de sommets dépasse quelques centaines, la visualisation du graphe dans son ensemble, qui est un outil important de compréhension du réseau, est un problème complexe : les approches traditionnelles, basées sur des algorithmes de forces, s’avèrent coûteuses en temps de calcul et ne mettent pas bien en valeur la structure du réseau en parties denses (souvent appelées « communautés »). Dans cet article, nous proposons une méthode de visualisation basée sur une classification hiérarchique des sommets : cette approche permet d’obtenir des représentations de graphes de plusieurs milliers de sommets en quelques secondes, en produisant des représentations avec des niveaux de simplification plus ou moins grossiers. L’utilisateur a donc accès à des visualisations lui permettant de comprendre la structuration macroscopique du réseau puis, par zooms successifs à des détails de plus en plus fins dans chacune des communautés. La finesse maximale est contrôlée par simulation. La qualité des partitions considérées est évaluée par la mesure classique de modularité et comparée à la qualité obtenue par la méthode proposée sur des graphes aléatoires dont la distribution des degrés est identique à celle du graphe étudié : on obtient ainsi une distribution de la modularité dans le cas sans structure, ce qui permet de ne montrer que les structures significatives. Cette approche est illustrée sur plusieurs jeux de données publics et comparée à d’autres méthodes de visualisation destinées à mettre en valeur les communautés du réseau. Elle est également testée sur un grand réseau issu d’un corpus d’archives du Moyen-Âge.

Graphs (or networks) are widely used to model relational data in various application fields (e.g., social network, biological network, Internet network...). Visualization is an important tool to understand the main features of the network but, when the number of nodes in the graph is greater than a few hundreds, standard visualization methods, such as force directed algorithms, are computationally expensive and almost unworkable. Moreover, force directed algorithms do not help the understanding of the structure of the network into dense communities of nodes, which is often a natural way to better understand a network. In this paper, a new visualization method is proposed, based on a hierarchical clustering of the nodes of the graph. This approach can handle the visualization of graphs having several thousands nodes in a few seconds. Several simplified representations of the graph are accessible, giving the user the opportunity to understand the macroscopic organization of the network and then, to focus with more details on some particular parts of the graph. This refining process is controlled by means of Monte Carlo simulation. Partitions under consideration are evaluated via the classical modularity quality measure. A distribution of the quality measure in the case of graphs without structure is obtained by applying the proposed method to random graphs with the same degree distribution as the graph under study. Then only significant partitions (with respect to this random level) are shown during the refining process. This approach is illustrated on several public datasets and compared with other visualization methods meant to emphasize the graph communities. It is also tested on a large network built from a corpus of medieval land charters.

Mot clés : réseau, graphe, visualisation, classification, modularité
Keywords: network, graph, visualization, clustering, modularity
@article{JSFS_2011__152_3_34_0,
     author = {Rossi, Fabrice and Villa-Vialaneix, Nathalie},
     title = {Repr\'esentation d{\textquoteright}un grand r\'eseau \`a partir d{\textquoteright}une classification hi\'erarchique de ses sommets},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {34--65},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {152},
     number = {3},
     year = {2011},
     mrnumber = {2871176},
     zbl = {1316.62008},
     language = {fr},
     url = {http://archive.numdam.org/item/JSFS_2011__152_3_34_0/}
}
TY  - JOUR
AU  - Rossi, Fabrice
AU  - Villa-Vialaneix, Nathalie
TI  - Représentation d’un grand réseau à partir d’une classification hiérarchique de ses sommets
JO  - Journal de la société française de statistique
PY  - 2011
SP  - 34
EP  - 65
VL  - 152
IS  - 3
PB  - Société française de statistique
UR  - http://archive.numdam.org/item/JSFS_2011__152_3_34_0/
LA  - fr
ID  - JSFS_2011__152_3_34_0
ER  - 
%0 Journal Article
%A Rossi, Fabrice
%A Villa-Vialaneix, Nathalie
%T Représentation d’un grand réseau à partir d’une classification hiérarchique de ses sommets
%J Journal de la société française de statistique
%D 2011
%P 34-65
%V 152
%N 3
%I Société française de statistique
%U http://archive.numdam.org/item/JSFS_2011__152_3_34_0/
%G fr
%F JSFS_2011__152_3_34_0
Rossi, Fabrice; Villa-Vialaneix, Nathalie. Représentation d’un grand réseau à partir d’une classification hiérarchique de ses sommets. Journal de la société française de statistique, Tome 152 (2011) no. 3, pp. 34-65. http://archive.numdam.org/item/JSFS_2011__152_3_34_0/

[1] Auber, D.; Chiricota, Y.; Jourdan, F.; Melançon, G. Multiscale visualization of small world networks., INFOVIS’03 (2003)

[2] Auber, D.; Jourdan, F. Interactive refinement of multi-scale network clusterings, International Conference on Information Visualisation, International Conference, IEEE Computer Society, Los Alamitos, CA, USA (2005), pp. 703-709 | DOI

[3] Bourqui, R.; Auber, D.; Mary, P. How to Draw Clustered Weighted Graphs using a Multilevel Force-Directed Graph Drawing Algorithm, Proceedings of the 11th International Conference Information Visualization, 2007. IV’07. (2007), pp. 757-764 | DOI

[4] Boulet, R.; Jouve, B.; Rossi, F.; Villa, N. Batch kernel SOM and related Laplacian methods for social network analysis, Neurocomputing, Volume 71 (2008) no. 7-9, pp. 1257-1273 | DOI

[5] Bayati, M.; Kim, J.; Saberi, A. A Sequential Algorithm for Generating Random Graphs, Algorithmica, Volume 58 (2010), pp. 860-910 | DOI | MR | Zbl

[6] Clémençon, S.; De Arazoza, H.; Rossi, F.; Tran, V.C. Hierarchical clustering for graph visualization, Proceedings of XVIIIth European Symposium on Artificial Neural Networks (ESANN 2011), Bruges, Belgique (2011), pp. 227-232

[7] Clémençon, S.; De Arazoza, H.; Rossi, F.; Tran, V.C. Visual mining of epidemic networks, Advances in Computational Intelligence, Proceedings of 11th International Work-Conference on Artificial Neural Networks (IWANN 2011) (Cabestany, Joan; Rojas, Ignacio; Joya, Gonzalo, eds.) (Lecture Notes in Computer Science), Springer Berlin / Heidelberg, Malaga, Spain, 2011 no. 6692, pp. 276-283 | DOI

[8] Chung, F. Random Graphs, A Whirlwind Tour of, Encyclopedia of Complexity and Systems Science (Meyers, Robert A., ed.), Springer New York, 2009, pp. 7493-7505 | DOI

[9] Csardi, G.; Nepusz, T. The igraph software package for complex network research, InterJournal, Volume Complex Systems (2006) http://igraph.sf.net

[10] Di Battista, G.; Eades, P.; Tamassia, R.; Tollis, I.G. Graph Drawing : Algorithms for the Visualization of Graphs, Prentice Hall, 1999 | MR | Zbl

[11] Daudin, J.J.; Picard, F.; Robin, S. A mixture model for random graphs, Statistics and Computing, Volume 18 (2008), pp. 173-183 | MR

[12] Eades, P. A heuristic for graph drawing, Congressus Numerantium, Volume 42 (1984), pp. 149-160 | MR

[13] Eades, P.; Feng, Q.W. Multilevel Visualization of Clustered Graphs, Graph Drawing, Symposium on Graph Drawing, GD ’96, Berkeley, California, USA, September 18-20, Proceedings (North, Stephen C., ed.) (Lecture Notes in Computer Science), Volume 1190, Springer (1996), pp. 101-112

[14] Eades, P.; Huang, M.L. Navigating clustered graphs using force-directed methods, Journal of Graph Algorithms and Applications, Volume 4 (2000) no. 3, pp. 157-181 | Zbl

[15] Fortunato, S.; Barthélémy, M. Resolution limit in community detection, Proceedings of the National Academy of Sciences, Volume 104 (2007) no. 1, pp. 36-41 (doi :10.1073/pnas.0605965104 ; URL : http://www.pnas.org/content/104/1/36.abstract )

[16] Fortunato, S. Community detection in graphs, Physics Reports, Volume 486 (2010), pp. 75-174 | arXiv | MR

[17] Fruchterman, T.; Reingold, B. Graph drawing by force-directed placement, Software-Practice and Experience, Volume 21 (1991), pp. 1129-1164

[18] Handcock, M.S.; Hunter, D.R.; Butts, C.T.; Goodreau, S.M.; Morris, M. statnet : Software tools for the Statistical Modeling of Network Data (2003) http://statnetproject.org (Version 2.0)

[19] Harel, D.; Koren, Y. Drawing graphs with non-uniform vertices, Proceedings of the Working Conference on Advanced Visual Interfaces (AVI ’02), ACM, New York, NY, USA (2002), pp. 157-166 | DOI

[20] Herman, I.; Melançon, G.; Scott Marshall, M. Graph visualization and navigation in information visualisation, IEEE Transactions on Visualization and Computer Graphics, Volume 6 (2000) no. 1, pp. 24-43

[21] Hoff, P. D.; Raftery, A. E.; Handcock, M. S. Latent space approaches to social network analysis, Journal of the American Statistical Association, Volume 97 (2002) no. 460, pp. 1090-1098 http://www.stat.washington.edu/raftery/Research/PDF/hoff2002.pdf | MR | Zbl

[22] Krivitsky, P.N.; Handcock, M.S. Fitting Latent Cluster Models for Networks with latentnet, Journal of Statistical Software, Volume 24 (2008) no. 5, pp. 1-23 http://www.jstatsoft.org/v24/i05

[23] Milo, R.; Kashtan, N.; Itzkovitz, S.; Newman, M. E. J.; Alon, U. On the uniform generation of random graphs with prescribed degree sequences, eprint (2003) | arXiv

[24] Newman, M.E.J. The structure and function of complex networks, SIAM Review, Volume 45 (2003), pp. 167-256 | MR | Zbl

[25] Newman, M.E.J. Finding community structure in networks using the eigenvectors of matrices, Physical Review, E, Volume 74 (2006) no. 036104 | arXiv | MR

[26] Newman, M.E.J.; Girvan, M. Finding and evaluating community structure in networks, Physical Review, E, Volume 69 (2004) http://www.citebase.org/abstract?id=oai%3AarXiv.org%3Acond-mat%2F0308217 | DOI

[27] Noack, A. Energy models for graph clustering, Journal of Graph Algorithms and Applications, Volume 11 (2007) no. 2, pp. 453-480 | MR | Zbl

[28] Noack, A. Modularity Clustering is Force-Directed Layout, Physical Review E, Volume 79 (2009) no. 026102

[29] Noack, A.; Rotta, R. Multi-level Algorithms for Modularity Clustering, SEA ’09 : Proceedings of the 8th International Symposium on Experimental Algorithms, Springer-Verlag, Berlin, Heidelberg (2009), pp. 257-268 | DOI

[30] Pons, P.; Latapy, M. Post-processing hierarchical community structures : Quality improvements and multi-scale view, Theoretical Computer Science, Volume 412 (2011) no. 8-10, pp. 892 -900 | DOI | MR | Zbl

[31] R Development Core Team R : A Language and Environment for Statistical Computing (2011) http://www.R-project.org (ISBN 3-900051-07-0)

[32] Reichardt, J.; Bornholdt, S. Partitioning and modularity of graphs with arbitrary degree distribution, Phys. Rev. E, Volume 76 (2007) no. 1 | DOI

[33] Roberts Jr., J. M. Simple methods for simulating sociomatrices with given marginal totals, Social Networks, Volume 22 (2000) no. 3, pp. 273 -283 | DOI

[34] Ramachandra Rao, A.; Jana, R.; Bandyopadhyay, S. A Markov Chain Monte Carlo Method for Generating Random (0, 1)-Matrices with Given Marginals, Sankhya : The Indian Journal of Statistics, Series A, Volume 58 (1996) no. 2, pp. 225-242 | MR | Zbl

[35] Rossi, F.; Villa-Vialaneix, N. Optimizing an organized modularity measure for topographic graph clustering : a deterministic annealing approach, Neurocomputing, Volume 73 (2010) no. 7-9, pp. 1142-1163 | DOI

[36] Schaeffer, S.E. Graph Clustering, Computer Science Review, Volume 1 (2007) no. 1, pp. 27-64 | Zbl

[37] Seifi, M.; Guillaume, J.L.; Latapy, M.; Le Grand, B. Visualisation interactive multi-échelle des grands graphes : application à un réseau de blogs, Atelier EGC 2010, Visualisation et Extraction de Connaissances, Hammamet, Tunisie (2010)

[38] Truong, Q.D.; Dkaki, T.; Charrel, P.J. An energy model for the drawing of clustered graphs, Proceedings of Vème colloque international VSST, Marrakech, Maroc (2007)

[39] Tunkelang, D. A Numerical Optimization Approach to General Graph Drawing, School of Computer Science, Carnegie Mellon University, January (1999) http://reports-archive.adm.cs.cmu.edu/anon/1998/abstracts/98-189.html (Ph. D. Thesis CMU-CS-98-189)

[40] von Luxburg, U. A tutorial on spectral clustering, Statistics and Computing, Volume 17 (2007) no. 4, pp. 395-416 http://www.kyb.mpg.de/publications/attachments/luxburg06_TR_v2_4139%5B1%5D.pdf | MR

[41] Wang, X.; Miyamoto, I. Generating customized layouts, Graph Drawing (Brandenburg, Franz, ed.) (Lecture Notes in Computer Science), Volume 1027, Springer Berlin / Heidelberg, 1996, pp. 504-515

[42] Zanghi, H.; Ambroise, C.; V., Miele Fast online graph clustering via Erdös-Rényi mixture, Pattern Recognition, Volume 41 (2008), pp. 3592-3599 | Zbl