TY - JOUR
T1 - Reconstruction and estimation in the planted partition model
AU - Mossel, Elchanan
AU - Neeman, Joe
AU - Sly, Allan
N1 - Publisher Copyright:
© 2014, Springer-Verlag Berlin Heidelberg.
PY - 2015/8/18
Y1 - 2015/8/18
N2 - The planted partition model (also known as the stochastic blockmodel) is a classical cluster-exhibiting random graph model that has been extensively studied in statistics, physics, and computer science. In its simplest form, the planted partition model is a model for random graphs on $$n$$n nodes with two equal-sized clusters, with an between-class edge probability of $$q$$q and a within-class edge probability of $$p$$p. Although most of the literature on this model has focused on the case of increasing degrees (ie. $$pn, qn \rightarrow \infty $$pn,qn→∞ as $$n \rightarrow \infty $$n→∞), the sparse case $$p, q = O(1/n)$$p,q=O(1/n) is interesting both from a mathematical and an applied point of view. A striking conjecture of Decelle, Krzkala, Moore and Zdeborová based on deep, non-rigorous ideas from statistical physics gave a precise prediction for the algorithmic threshold of clustering in the sparse planted partition model. In particular, if $$p = a/n$$p=a/n and $$q = b/n$$q=b/n, then Decelle et al. conjectured that it is possible to cluster in a way correlated with the true partition if $$(a - b)^2 > 2(a + b)$$(a-b)2>2(a+b), and impossible if $$(a - b)^2 < 2(a + b)$$(a-b)2<2(a+b). By comparison, the best-known rigorous result is that of Coja-Oghlan, who showed that clustering is possible if $$(a - b)^2 > C (a + b)$$(a-b)2>C(a+b) for some sufficiently large $$C$$C. We prove half of their prediction, showing that it is indeed impossible to cluster if $$(a - b)^2 < 2(a + b)$$(a-b)2<2(a+b). Furthermore we show that it is impossible even to estimate the model parameters from the graph when $$(a - b)^2 < 2(a + b)$$(a-b)2<2(a+b); on the other hand, we provide a simple and efficient algorithm for estimating $$a$$a and $$b$$b when $$(a - b)^2 > 2(a + b)$$(a-b)2>2(a+b). Following Decelle et al, our work establishes a rigorous connection between the clustering problem, spin-glass models on the Bethe lattice and the so called reconstruction problem. This connection points to fascinating applications and open problems.
AB - The planted partition model (also known as the stochastic blockmodel) is a classical cluster-exhibiting random graph model that has been extensively studied in statistics, physics, and computer science. In its simplest form, the planted partition model is a model for random graphs on $$n$$n nodes with two equal-sized clusters, with an between-class edge probability of $$q$$q and a within-class edge probability of $$p$$p. Although most of the literature on this model has focused on the case of increasing degrees (ie. $$pn, qn \rightarrow \infty $$pn,qn→∞ as $$n \rightarrow \infty $$n→∞), the sparse case $$p, q = O(1/n)$$p,q=O(1/n) is interesting both from a mathematical and an applied point of view. A striking conjecture of Decelle, Krzkala, Moore and Zdeborová based on deep, non-rigorous ideas from statistical physics gave a precise prediction for the algorithmic threshold of clustering in the sparse planted partition model. In particular, if $$p = a/n$$p=a/n and $$q = b/n$$q=b/n, then Decelle et al. conjectured that it is possible to cluster in a way correlated with the true partition if $$(a - b)^2 > 2(a + b)$$(a-b)2>2(a+b), and impossible if $$(a - b)^2 < 2(a + b)$$(a-b)2<2(a+b). By comparison, the best-known rigorous result is that of Coja-Oghlan, who showed that clustering is possible if $$(a - b)^2 > C (a + b)$$(a-b)2>C(a+b) for some sufficiently large $$C$$C. We prove half of their prediction, showing that it is indeed impossible to cluster if $$(a - b)^2 < 2(a + b)$$(a-b)2<2(a+b). Furthermore we show that it is impossible even to estimate the model parameters from the graph when $$(a - b)^2 < 2(a + b)$$(a-b)2<2(a+b); on the other hand, we provide a simple and efficient algorithm for estimating $$a$$a and $$b$$b when $$(a - b)^2 > 2(a + b)$$(a-b)2>2(a+b). Following Decelle et al, our work establishes a rigorous connection between the clustering problem, spin-glass models on the Bethe lattice and the so called reconstruction problem. This connection points to fascinating applications and open problems.
KW - 90B15
KW - 91D30
KW - Primary 05C80
KW - Secondary 60J85
UR - http://www.scopus.com/inward/record.url?scp=84939262924&partnerID=8YFLogxK
U2 - 10.1007/s00440-014-0576-6
DO - 10.1007/s00440-014-0576-6
M3 - Article
SN - 0178-8051
VL - 162
SP - 431
EP - 461
JO - Probability Theory and Related Fields
JF - Probability Theory and Related Fields
IS - 3-4
ER -