We study the scenario of graph-based clustering algorithms such as spectral clustering. Given a set of data points, one first has to construct a graph on the data points and then apply a graph clustering algorithm to find a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) influences the outcome of the final clustering result. To this end we study the convergence of cluster quality measures such as the normalized cut or the Cheeger cut on various kinds of random geometric graphs as the sample size tends to infinity. It turns out that the limit values of the same objective function are systematically different on different types of graphs. This implies that clustering results systematically depend on the graph and can be very different for different types of graph. We provide examples to illustrate the implications on spectral clustering.
Mots-clés : random geometric graph, clustering, graph cuts
@article{PS_2013__17__370_0, author = {Maier, Markus and von Luxburg, Ulrike and Hein, Matthias}, title = {How the result of graph clustering methods depends on the construction of the graph}, journal = {ESAIM: Probability and Statistics}, pages = {370--418}, publisher = {EDP-Sciences}, volume = {17}, year = {2013}, doi = {10.1051/ps/2012001}, mrnumber = {3066385}, zbl = {1284.62382}, language = {en}, url = {http://archive.numdam.org/articles/10.1051/ps/2012001/} }
TY - JOUR AU - Maier, Markus AU - von Luxburg, Ulrike AU - Hein, Matthias TI - How the result of graph clustering methods depends on the construction of the graph JO - ESAIM: Probability and Statistics PY - 2013 SP - 370 EP - 418 VL - 17 PB - EDP-Sciences UR - http://archive.numdam.org/articles/10.1051/ps/2012001/ DO - 10.1051/ps/2012001 LA - en ID - PS_2013__17__370_0 ER -
%0 Journal Article %A Maier, Markus %A von Luxburg, Ulrike %A Hein, Matthias %T How the result of graph clustering methods depends on the construction of the graph %J ESAIM: Probability and Statistics %D 2013 %P 370-418 %V 17 %I EDP-Sciences %U http://archive.numdam.org/articles/10.1051/ps/2012001/ %R 10.1051/ps/2012001 %G en %F PS_2013__17__370_0
Maier, Markus; von Luxburg, Ulrike; Hein, Matthias. How the result of graph clustering methods depends on the construction of the graph. ESAIM: Probability and Statistics, Tome 17 (2013), pp. 370-418. doi : 10.1051/ps/2012001. http://archive.numdam.org/articles/10.1051/ps/2012001/
[1] Fast probabilistic algorithms for Hamiltonian circuits. J. Comput. Syst. Sci. 18 (1979) 155-193. | MR | Zbl
and ,[2] A graph-based estimator of the number of clusters. ESAIM: PS 11 (2007) 272-280. | Numdam | MR | Zbl
, and ,[3] Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection. Stat. Probab. Lett. 35 (1997) 33-42. | MR | Zbl
, , and ,[4] Nearest neighbor clustering: a baseline method for consistent clustering with arbitrary objective functions. J. Mach. Learn. Res. 10 (2009) 657-698. | MR | Zbl
and ,[5] Handbook of Mathematics and Computational Science. Springer (1998). | MR | Zbl
and ,[6] Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58 (1963) 13-30. | MR | Zbl
,[7] A nonparametric estimate of a multivariate density function. Ann. Math. Stat. 36 (1965) 1049-1051. | MR | Zbl
and ,[8] Optimal construction of k-nearest neighbor graphs for identifying noisy clusters. Theoret. Comput. Sci. 410 (2009) 1749-1764. | MR | Zbl
, and ,[9] Influence of graph construction on graph-based clustering measures, in Advances in Neural Information Processing Systems, vol. 21, edited by D. Koller, D. Schuurmans, Y. Bengio and L. Bottou. MIT Press (2009) 1025-1032.
, and ,[10] Separators for sphere-packings and nearest neighbor graphs. J. ACM 44 (1997) 1-29. | MR | Zbl
, , and ,[11] On the relation between low density separation, spectral clustering and graph cuts, in Advances in Neural Information Processing Systems, vol. 19, edited by B. Schölkopf, J. Platt and T. Hoffman. MIT Press (2007) 1025-1032.
, and ,[12] Algorithmic Chernoff-Hoeffding inequalities in integer programming. Random Struct. Algorithms 8 (1996) 27-58. | MR | Zbl
and ,[13] A tutorial on spectral clustering. Stat. Comput. 17 (2007) 395-416. | MR
,[14] Consistency of spectral clustering. Ann. Stat. 36 (2008) 555-586. | MR | Zbl
, and ,Cité par Sources :