Numéro spécial : Special Issue on Networks and Statistics
Using SOMbrero for clustering and visualizing graphs
[Utiliser SOMbrero pour la classification et la visualisation de graphes]
Journal de la société française de statistique, Tome 156 (2015) no. 3, pp. 95-119.

L’analyse de graphes a connu un intérêt croissant dans les dernières années, avec des applications en sciences sociales, biologie, informatique, ... Dans cet article, nous illustrons comment les cartes auto-organisatrices (SOM) peuvent être utilisées pour mettre en lumière la structure d’un graphe en combinant la classification de ses sommets avec une visualisation simplifiée de celui-ci. En particulier, nous présentons le package R SOMbrero dans lequel est implémentée une version stochastique de l’approche dite « relationnelle » de l’algorithme de cartes auto-organisatrices. Cette méthode permet d’utiliser les cartes auto-organisatrices avec des données décrites par des mesures de dissimilarité et nous discutons et comparons ici plusieurs types de dissimilarités adaptées aux graphes. L’utilisation du package est illustrée sur deux jeux de données réelles : le premier, inclus dans le package lui-même, est suffisamment petit pour permettre l’analyse complète de l’influence du choix de la mesure de dissimilarité sur les résultats. Le second exemple provient d’une application en biologie et est basé sur un graphe biparti de grande taille, issu de réactions chimiques et qui contient plusieurs milliers de nœuds.

Graphs have attracted a burst of attention in the last years, with applications to social science, biology, computer science... In the present paper, we illustrate how self-organizing maps (SOM) can be used to enlighten the structure of the graph, performing clustering of the graph together with visualization of a simplified graph. In particular, we present the R package SOMbrero which implements a stochastic version of the so-called relational algorithm: the method is able to process any dissimilarity data and several dissimilarities adapted to graphs are described and compared. The use of the package is illustrated on two real-world datasets: one, included in the package itself, is small enough to allow for a full investigation of the influence of the choice of a dissimilarity to measure the proximity between the vertices on the results. The other example comes from an application in biology and is based on a large bipartite graph of chemical reactions with several thousands vertices.

Keywords: graph, R package, self-organizing map, clustering, visualization
Mot clés : graphe, package R, carte auto-organisatrice, classification, visualisation
@article{JSFS_2015__156_3_95_0,
     author = {Olteanu, Madalina and Villa-Vialaneix, Nathalie},
     title = {Using {SOMbrero} for clustering and visualizing graphs},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {95--119},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {156},
     number = {3},
     year = {2015},
     zbl = {1336.62034},
     language = {en},
     url = {http://archive.numdam.org/item/JSFS_2015__156_3_95_0/}
}
TY  - JOUR
AU  - Olteanu, Madalina
AU  - Villa-Vialaneix, Nathalie
TI  - Using SOMbrero for clustering and visualizing graphs
JO  - Journal de la société française de statistique
PY  - 2015
SP  - 95
EP  - 119
VL  - 156
IS  - 3
PB  - Société française de statistique
UR  - http://archive.numdam.org/item/JSFS_2015__156_3_95_0/
LA  - en
ID  - JSFS_2015__156_3_95_0
ER  - 
%0 Journal Article
%A Olteanu, Madalina
%A Villa-Vialaneix, Nathalie
%T Using SOMbrero for clustering and visualizing graphs
%J Journal de la société française de statistique
%D 2015
%P 95-119
%V 156
%N 3
%I Société française de statistique
%U http://archive.numdam.org/item/JSFS_2015__156_3_95_0/
%G en
%F JSFS_2015__156_3_95_0
Olteanu, Madalina; Villa-Vialaneix, Nathalie. Using SOMbrero for clustering and visualizing graphs. Journal de la société française de statistique, Tome 156 (2015) no. 3, pp. 95-119. http://archive.numdam.org/item/JSFS_2015__156_3_95_0/

[Ambroise and Govaert, 1996] Ambroise, C. and Govaert, G. (1996). Analyzing dissimilarity matrices via Kohonen maps. In Proceedings of 5th Conference of the International Federation of Classification Societies (IFCS 1996), volume 2, pages 96–99, Kobe (Japan).

[Anderson, 2001] Anderson, M. (2001). A new method for non-parametric multivariate analysis of variance. Austral Ecology, 26:32–46.

[Andras, 2002] Andras, P. (2002). Kernel-Kohonen networks. International Journal of Neural Systems, 12:117–135.

[Aronszajn, 1950] Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society, 68(3):337–404. | Zbl

[Ben-Hur and Weston, 2010] Ben-Hur, A. and Weston, J. (2010). Data Mining Techniques for the Life Sciences, volume 609 of Methods in Molecular Biology, chapter A user’s guide to support vector machine, pages 223–239. Springer-Verlag.

[Boulet et al., 2008] Boulet, R., Jouve, B., Rossi, F., and Villa, N. (2008). Batch kernel SOM and related Laplacian methods for social network analysis. Neurocomputing, 71(7-9):1257–1273.

[Bourgeois et al., 2015] Bourgeois, N., Cottrell, M., Lamassé, S., and Olteanu, M. (2015). Search for meaning through the study of co-occurrences in texts. Submitted for publication in proceedings of IWANN 2015.

[Conan-Guez et al., 2006] Conan-Guez, B., Rossi, F., and El Golli, A. (2006). Fast algorithm and implementation of dissimilarity self-organizing maps. Neural Networks, 19(6-7):855–863. | Zbl

[Cottrell et al., 1993] Cottrell, M., Letrémy, P., and Roy, E. (1993). Analyzing a contingency table with Kohonen maps: a factorial correspondence analysis. In Cabestany, J., Mary, J., and Prieto, A. E., editors, Proceedings of International Workshop on Artificial Neural Networks (IWANN 93), Lecture Notes in Computer Science, pages 305–311. Springer Verlag.

[Csardi and Nepusz, 2006] Csardi, G. and Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems.

[Danon et al., 2005] Danon, L., Diaz-Guilera, A., Duch, J., and Arenas, A. (2005). Comparing community structure identification. Journal of Statistical Mechanics, page P09008.

[Eades and Huang, 2000] Eades, P. and Huang, M. (2000). Navigating clustered graphs using force-directed methods. Journal of Graph Algorithms and Applications, 4(3):157–181. | Zbl

[Fort et al., 2002] Fort, J., Letremy, P., and Cottrell, M. (2002). Advantages and drawbacks of the batch kohonen algorithm. In Verleysen, M., editor, Proceedings of 10th European Symposium on Artificial Neural Networks (ESANN 2002), pages 223–230, Bruges, Belgium.

[Fouss et al., 2007] Fouss, F., Pirotte, A., Renders, J., and Saerens, M. (2007). Random-walk computation of similarities between nodes of a graph, with application to collaborative recommendation. IEEE Transactions on Knowledge and Data Engineering, 19(3):355–369.

[Fruchterman and Reingold, 1991] Fruchterman, T. and Reingold, B. (1991). Graph drawing by force-directed placement. Software, Practice and Experience, 21:1129–1164.

[Goldfarb, 1984] Goldfarb, L. (1984). A unified approach to pattern recognition. Pattern Recognition, 17(5):575–582. | Zbl

[Graepel and Obermayer, 1999] Graepel, T. and Obermayer, K. (1999). A stochastic self-organizing map for proximity data. Neural Computation, 11(1):139–155.

[Hammer and Hasenfuss, 2010] Hammer, B. and Hasenfuss, A. (2010). Topographic mapping of large dissimilarity data sets. Neural Computation, 22(9):2229–2284. | MR | Zbl

[Hammer et al., 2014] Hammer, B., Hoffman, D., Schleif, F., and Zhu, X. (2014). Learning vector quantization for (dis-)similarities. Neurocomputing, 131.

[Hanisch et al., 2002] Hanisch, D., Zien, A., Zimmer, R., and Lengauer, T. (2002). Co-clustering of biological networks and gene expression data. Bioinformatics, 18(Suppl. 1):S145–S154.

[Harel and Koren, 2002] Harel, D. and Koren, Y. (2002). Drawing graphs with non-uniform vertices. In Proceedings of the Working Conference on Advanced Visualization Interfaces (AVI’02), pages 157–166, New York, NY, USA. ACM Press.

[Herman et al., 2000] Herman, I., Melançon, G., and Scott Marshall, M. (2000). Graph visualization and navigation in information visualisation. IEEE Transactions on Visualization and Computer Graphics, 6(1):24–43.

[Heskes, 1999] Heskes, T. (1999). Energy functions for self-organizing maps. In Oja, E. and Kaski, S., editors, Kohonen Maps, pages 303–315. Elsevier, Amsterdam.

[Knuth, 1993] Knuth, D. (1993). The Stanford GraphBase: A Platform for Combinatorial Computing. Addison-Wesley, Reading, MA. | Zbl

[Kohohen and Somervuo, 1998] Kohohen, T. and Somervuo, P. (1998). Self-organizing maps of symbol strings. Neurocomputing, 21:19–30. | Zbl

[Kohonen, 2001] Kohonen, T. (2001). Self-Organizing Maps, 3rd Edition, volume 30. Springer, Berlin, Heidelberg, New York. | MR | Zbl

[Kondor and Lafferty, 2002] Kondor, R. and Lafferty, J. (2002). Diffusion kernels on graphs and other discrete structures. In Proceedings of the 19th International Conference on Machine Learning, pages 315–322.

[Krislock and Wolkowicz, 2012] Krislock, N. and Wolkowicz, H. (2012). Handbook on Semidefinite, Conic and Polynomial Optimization, volume 166 of International Series in Operations Research & Management Science, chapter Euclidean distance matrices and applications, pages 879–914. Springer, New York, Dordrecht, Heidelberg, London. | MR

[Mac Donald and Fyfe, 2000] Mac Donald, D. and Fyfe, C. (2000). The kernel self organising map. In Proceedings of 4th International Conference on knowledge-based Intelligence Engineering Systems and Applied Technologies, pages 317–320.

[RStudio and Inc., 2013] RStudio and Inc. (2013). shiny: Web Application Framework for R. R package version 0.6.0.

[Newman, 2006] Newman, M. (2006). Finding community structure in networks using the eigenvectors of matrices. Physical Review, E, 74(036104). | MR

[Newman and Girvan, 2004] Newman, M. and Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review, E, 69:026113.

[Noack, 2007] Noack, A. (2007). Energy models for graph clustering. Journal of Graph Algorithms and Applications, 11(2):453–480. | MR | Zbl

[Olteanu and Villa-Vialaneix, 2015] Olteanu, M. and Villa-Vialaneix, N. (2015). On-line relational and multiple relational SOM. Neurocomputing, 147:15–30.

[Polzlbauer, 2004] Polzlbauer, G. (2004). Survey and comparison of quality measures for self-organizing maps. In Paralic, J., Polzlbauer, G., and Rauber, A., editors, Proceedings of the Fifth Workshop on Data Analysis (WDA’04), pages 67–82, Sliezsky dom, Vysoke Tatry, Slovakia. Elfa Academic Press.

[Rossi, 2012] Rossi, F. (2012). yasomi: Yet Another Self Organising Map Implementation. R package version 0.3/r39.

[Rossi, 2014] Rossi, F. (2014). How many dissimilarity/kernel self organizing map variants do we need? In Villmann, T., Schleif, F., Kaden, M., and Lange, M., editors, Advances in Self-Organizing Maps and Learning Vector Quantization (Proceedings of WSOM 2014), volume 295 of Advances in Intelligent Systems and Computing, pages 3–23, Mittweida, Germany. Springer Verlag, Berlin, Heidelberg.

[Rossi and Villa-Vialaneix, 2010] Rossi, F. and Villa-Vialaneix, N. (2010). Optimizing an organized modularity measure for topographic graph clustering: a deterministic annealing approach. Neurocomputing, 73(7-9):1142–1163.

[Rossi and Villa-Vialaneix, 2011] Rossi, F. and Villa-Vialaneix, N. (2011). Représentation d’un grand réseau à partir d’une classification hiérarchique de ses sommets. Journal de la Société Française de Statistique, 152(3):34–65. | Numdam | MR

[Schellenberger et al., 2010] Schellenberger, J., Park, O., Conrad, T., and Palsson, B. (2010). BiGG: a biochemical genetic and genomic knowledgebase of large scale metabolic reconstructions. BMC Bioinformatics, 11:213.

[Schoenberg, 1935] Schoenberg, I. (1935). Remarks to Maurice Fréchet’s article “Sur la définition axiomatique d’une classe d’espace distanciés vectoriellement applicable sur l’espace de Hilbert”. Annals of Mathematics, 36:724–732. | JFM | MR

[Schulz et al., 2008] Schulz, H., John, M., Unger, A., and Schumann, H. (2008). Visual analysis of bipartite biological networks. In Botha, C., Kindlmann, G., Niessen, W., and Preim, B., editors, Proceedings Eurographics Workshop on Computing for Biomedicine, VCBM 2008, Delft, The Nederlands.

[Smola and Kondor, 2003] Smola, A. and Kondor, R. (2003). Kernels and regularization on graphs. In Warmuth, M. and Schölkopf, B., editors, Proceedings of the Conference on Learning Theory (COLT) and Kernel Workshop, Lecture Notes in Computer Science, pages 144–158. | Zbl

[Tunkelang, 1999] Tunkelang, D. (1999). A Numerical Optimization Approach to General Graph Drawing. PhD thesis, School of Computer Science, Carnegie Mellon University. CMU-CS-98-189.

[von Luxburg, 2007] von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416. | MR

[Wang and Miyamoto, 1996] Wang, X. and Miyamoto, I. (1996). Generating customized layouts. In Brandenburg, F., editor, Graph Drawing, volume 1027 of Lecture Notes in Computer Science, pages 504–515. Springer (Berlin/Heidelberg).

[Young and Householder, 1938] Young, G. and Householder, A. (1938). Discussion of a set of points in terms of their mutual distances. Psychometrika, 3:19–22. | JFM