TY - JOUR

T1 - Reconstruction of Markov random fields from samples

T2 - Some observations and algorithms

AU - Bresler, Guy

AU - Mossel, Elchanan

AU - Sly, Allan

PY - 2013

Y1 - 2013

N2 - Markov random fields are used to model high dimensional distributions in a number of applied areas. Much recent interest has been devoted to the reconstruction of the dependency structure from independent samples from the Markov random fields. We analyze a simple algorithm for reconstructing the underlying graph defining a Markov random field on n nodes and maximum degree d given observations. We show that under mild nondegeneracy conditions it reconstructs the generating graph with high probability using Θ(dε-2 δ-4 log n) samples, where e, δ depend on the local interactions. For most local interactions ε,δ are of order exp(- O(d)). Our results are optimal as a function of n up to a multiplicative constant depending on d and the strength of the local interactions. Our results seem to be the first results for general models that guarantee that the generating model is reconstructed. Furthermore, we provide explicit O(nd+2 ε-2 δ-4 log n) running-time bound. In cases where the measure on the graph has correlation decay, the running time is O(n2 log n) for all fixed d. We also discuss the effect of observing noisy samples and show that as long as the noise level is low, our algorithm is effective. On the other hand, we construct an example where large noise implies nonidentifiability even for generic noise and interactions. Finally, we briefly show that in some simple cases, models with hidden nodes can also be recovered.

AB - Markov random fields are used to model high dimensional distributions in a number of applied areas. Much recent interest has been devoted to the reconstruction of the dependency structure from independent samples from the Markov random fields. We analyze a simple algorithm for reconstructing the underlying graph defining a Markov random field on n nodes and maximum degree d given observations. We show that under mild nondegeneracy conditions it reconstructs the generating graph with high probability using Θ(dε-2 δ-4 log n) samples, where e, δ depend on the local interactions. For most local interactions ε,δ are of order exp(- O(d)). Our results are optimal as a function of n up to a multiplicative constant depending on d and the strength of the local interactions. Our results seem to be the first results for general models that guarantee that the generating model is reconstructed. Furthermore, we provide explicit O(nd+2 ε-2 δ-4 log n) running-time bound. In cases where the measure on the graph has correlation decay, the running time is O(n2 log n) for all fixed d. We also discuss the effect of observing noisy samples and show that as long as the noise level is low, our algorithm is effective. On the other hand, we construct an example where large noise implies nonidentifiability even for generic noise and interactions. Finally, we briefly show that in some simple cases, models with hidden nodes can also be recovered.

KW - Algorithms

KW - Correlation decay

KW - Markov random fields

UR - http://www.scopus.com/inward/record.url?scp=84880110852&partnerID=8YFLogxK

U2 - 10.1137/100796029

DO - 10.1137/100796029

M3 - Article

SN - 0097-5397

VL - 42

SP - 563

EP - 578

JO - SIAM Journal on Computing

JF - SIAM Journal on Computing

IS - 2

ER -