Numéro spécial : Special Issue on Networks and Statistics
Co-clustering through Latent Bloc Model: a Review
[Une revue bibliographique de la classification croisée au travers du modèle des blocs latents]
Journal de la société française de statistique, Tome 156 (2015) no. 3, pp. 120-139.

Nous présentons ici les méthodes de co-clustering, avec une emphase sur les modèles à blocs latents (LBM) et les parallèles qui existent entre le LBM et le Modèle à Blocs Stochastiques (SBM), notamment pour l’analyse de graphes bipartites. Nous introduisons différentes variantes du LBM (standard, sparse, bayésien) et présentons des résultats d’identifiabilité. Nous montrons comment la structure de dépendance complexe induite par le LBM rend l’estimation des paramètres par maximum de vraisemblance impossible en pratique et passons en revue des méthodes d’inférence alternatives. Ces dernières sont basées sur des procédures itératives, combinées à des approximations faciles à maximiser de la vraisemblance, ce qui les rend malaisés à analyser théoriquement. Il existe néanmoins des résultats de consistence, partiels en ce qu’ils reposent sur une condition raisonnable mais encore non démontrée. De même, les outils de sélection de modèle actuellement disponibles pour choisir le nombre de cluster reposent sur une conjecture. Nous replacons brièvement LBM dans le contexte des méthodes de co-clustering qui ne s’appuient pas sur un modèle génératif, particulièrement celles basées sur la factorisation de matrices. Nous concluons avec une étude de cas qui illustre les avantages du co-clustering sur le clustering simple.

We present here model-based co-clustering methods, with a focus on the latent block model (LBM). We introduce several specifications of the LBM (standard, sparse, Bayesian) and review some identifiability results. We show how the complex dependency structure prevents standard maximum likelihood estimation and present alternative and popular inference methods. Those estimation methods are based on a tractable approximation of the likelihood and rely on iterative procedures, which makes them difficult to analyze. We nevertheless present some asymptotic results for consistency. The results are partial as they rely on a reasonable but still unproved condition. Likewise, available model selection tools for choosing the number of groups in rows and columns are only valid up to a conjecture. We also briefly discuss non model-based co-clustering procedures. Finally, we show how LBM can be used for bipartite graph analysis and highlight throughout this review its connection to the Stochastic Block Model.

Keywords: Latent Variable model, Latent Block Model, Variational approximation, Model Selection, ICL, BIC, Bipartite Graphs
Mot clés : Modèle à blocs latents, Modèle à variables latentes, Approximation variationnelle, Sélection de modèle, ICL, BIC, Graphes bipartites
@article{JSFS_2015__156_3_120_0,
     author = {Brault, Vincent and Mariadassou, Mahendra},
     title = {Co-clustering through {Latent} {Bloc} {Model:} a {Review}},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {120--139},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {156},
     number = {3},
     year = {2015},
     mrnumber = {3432606},
     zbl = {1341.62172},
     language = {en},
     url = {http://archive.numdam.org/item/JSFS_2015__156_3_120_0/}
}
TY  - JOUR
AU  - Brault, Vincent
AU  - Mariadassou, Mahendra
TI  - Co-clustering through Latent Bloc Model: a Review
JO  - Journal de la société française de statistique
PY  - 2015
SP  - 120
EP  - 139
VL  - 156
IS  - 3
PB  - Société française de statistique
UR  - http://archive.numdam.org/item/JSFS_2015__156_3_120_0/
LA  - en
ID  - JSFS_2015__156_3_120_0
ER  - 
%0 Journal Article
%A Brault, Vincent
%A Mariadassou, Mahendra
%T Co-clustering through Latent Bloc Model: a Review
%J Journal de la société française de statistique
%D 2015
%P 120-139
%V 156
%N 3
%I Société française de statistique
%U http://archive.numdam.org/item/JSFS_2015__156_3_120_0/
%G en
%F JSFS_2015__156_3_120_0
Brault, Vincent; Mariadassou, Mahendra. Co-clustering through Latent Bloc Model: a Review. Journal de la société française de statistique, Tome 156 (2015) no. 3, pp. 120-139. http://archive.numdam.org/item/JSFS_2015__156_3_120_0/

[1] Aubert, Julie; Ha, Trung; MaryHuard, Tristan Modele à blocs latents pour l’analyse de données métagénomiques, 46 ème journées de Statistiques de la SFdS (2014)

[2] Allman, E.; Mattias, C.; Rhodes, J. Identifiability of parameters in latent structure models with many observed variables, The Annals of Statistics, Volume 37 (2009), pp. 3099-3132 | MR | Zbl

[3] Bickel, P.J.; Chen, A. A nonparametric view of network models and Newman-Girvan and other modularities, PNAS, Volume 106 (2009) no. 50, pp. 21068-21073 | DOI | Zbl

[4] Biernacki, Christophe; Jacques, Julien Modele génératif pour données ordinales, 44e Journées de Statistique, SFdS, Bruxelles, Belgique (2012)

[5] Bennett, James; Lanning, Stan The netflix prize, Proceedings of KDD cup and workshop, Volume 2007 (2007), 35 pages

[6] Brault, Vincent; Lomet, Aurore Revue bibliographique pour la classification croisée (2014)

[7] Brault, Vincent Estimation et sélection de modèles pour le modèle des blocs latents, Université Paris-Sud (2014) (Ph. D. Thesis)

[8] Celisse, Alain; Daudin, Jean-Jacques; Pierre, Laurent Consistency of maximum-likelihood and variational estimators in the stochastic block model, Electronic Journal of Statistics, Volume 6 (2012), pp. 1847-1899 | DOI | MR | Zbl

[9] Channarond, Antoine; Daudin, Jean-Jacques; Robin, Stéphane Classification and estimation in the Stochastic Blockmodel based on the empirical degrees, Electronic Journal of Statistics, Volume 6 (2012), pp. 2574-2601 | DOI | MR | Zbl

[10] Côme, E.; Latouche, P. Model selection and clustering in stochastic block models with the exact integrated complete data likelihood, ArXiv e-prints (2013) | arXiv | MR

[11] Choi, D. S.; Wolfe, P. J.; Airoldi, E. M. Stochastic blockmodels with a growing number of classes, Biometrika (2012) (in press) | DOI | MR | Zbl

[12] Dempster, A. P.; Laird, N. M.; Rubin, D. B. Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Statist. Soc. Ser. B, Volume 39 (1977) no. 1, pp. 1-38 (With discussion) | MR | Zbl

[13] Frühwirth-Schnatter, Sylvia Finite Mixture and Markov Switching Models, Springer, 2006 | MR | Zbl

[14] Gunawardana, Asela; Byrne, William Convergence Theorems for Generalized Alternating Minimization Procedures, Journal of Machine Learning Research, Volume 6 (2005), pp. 2049-2073 | MR | Zbl

[15] Gazal, Steven; Daudin, Jean-Jacques; Robin, Stéphane Accuracy of variational estimates for random graph mixture models, Journal of Statistical Computation and Simulation, Volume 0 (2011) no. 0, pp. 1-14 | DOI | Zbl

[16] Gyllenberg, M.; Koski, T.; Reilink, E.; Verlann, M. Non-Uniqueness in Probabilistic Numerical Identification of Bacteria, Journal of Applied Probability, Volume 31 (1994), pp. 542-548 | MR | Zbl

[17] Govaert, G.; Nadif, M. Clustering with block mixture models, Pattern Recognition, Volume 36 (2003), pp. 463-473

[18] Govaert, G.; Nadif, M. Block clustering with Bernoulli mixture models: Comparison of different approaches, Computational Statistics and Data Analysis, Volume 52 (2008), pp. 3233 -3245 | MR | Zbl

[19] Govaert, G.; Nadif, M. Latent block model for contingency table, Communication in Statistics - Theory and Methods, Volume 39 (2010), pp. 416 -425 | MR | Zbl

[20] Govaert, Gérard; Nadif, Mohamed Co-Clustering, ISTE Ltd and John Wiley & Sons, Inc, 2013

[21] Govaert, G. Classification croisée, Université Pierre et Marie Curie (1983) (Thèse d’état)

[22] Keribin, C.; Brault, V.; Celeux, G.; Govaert, G. Model selection for the binary latent block model, Proceedings of COMPSTAT 2012 (2012)

[23] Keribin, Christine; Brault, Vincent; Celeux, Gilles; Govaert, Gérard Estimation and selection for the latent block model on categorical data, Statistics and Computing (2014), pp. 1-16 | DOI | MR | Zbl

[24] Lomet, A.; Govaert, G.; Grandvalet, Y. Design of Artificial Data Tables for Co-Clustering Analysis (2012) (Technical report)

[25] Lomet, A. Sélection de modèle pour la classification croisée de données continues., Université de Technologie de Compiègne, December (2012) (PhD thesis)

[26] Long, B.; Zhang, Z. M.; Yu, P. S. Co-clustering by block value decomposition, Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, ACM (2005), pp. 635-640

[27] Mariadassou, Mahendra; Matias, Catherine Convergence of the groups posterior distribution in latent or stochastic block models, Bernoulli, Volume 21 (2015), pp. 537-573 | MR | Zbl

[28] Meeds, E.; Roweis, S. Nonparametric Bayesian biclustering (2007) (Technical report)

[29] Matias, Catherine; Robin, Stéphane Modeling heterogeneity in random graphs: a selective review, arXiv preprint arXiv:1402.4296 (2014) | MR | Zbl

[30] Rohe, K.; Chatterjee, S.; Yu, B. Spectral clustering and the high-dimensional Stochastic Block Model, Ann. Statist, Volume 39 (2011) no. 4, pp. 1878-1915 | MR | Zbl

[31] Raïffa, H.; Schlaifer, R. Applied statistical decision theory, Studies in managerial economics, Division of Research, Graduate School of Business Adminitration, Harvard University, 1961 http://books.google.fr/books?id=wPBLAAAAMAAJ | MR

[32] Shan, H.; Banerjee, A. Bayesian co-clustering, Eighth IEEE International Conference on Data Mining, 2008. ICDM’08 (2008), pp. 530-539

[33] Seung, D.; Lee, L. Algorithms for non-negative matrix factorization, Advances in Neural Information Processing Systems 13 (2001), pp. 556-562

[34] Tanay, Amos; Sharan, Roded; Kupiec, Martin; Shamir, Ron Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data, PNAS, Volume 101 (2004) no. 9, pp. 2981-2986

[35] Van Dijk, B.; Van Rosmalen, J.; Paap, R. A Bayesian approach to two-mode clustering (2009) no. 2009-06 http://hdl.handle.net/1765/15112 (Technical report)

[36] Wyse, J.; Friel, N. Block clustering with collapsed latent block models, Statistics and Computing, Volume 22 (2012), pp. 415-428 | MR | Zbl

[37] Wyse, J.; Friel, N.; Latouche, P. Inferring structure in bipartite networks using the latent block model and exact ICL, ArXiv e-prints (2014) | arXiv

[38] Yoo, J.; Choi, S. Orthogonal nonnegative matrix tri-factorization for co-clustering: Multiplicative updates on Stiefel manifolds, Information processing & management, Volume 46 (2010) no. 5, pp. 559-570