Bayesian Spanning Tree: Estimating the Backbone of the Dependence Graph


Abstract in English

In multivariate data analysis, it is often important to estimate a graph characterizing dependence among (p) variables. A popular strategy uses the non-zero entries in a (ptimes p) covariance or precision matrix, typically requiring restrictive modeling assumptions for accurate graph recovery. To improve model robustness, we instead focus on estimating the {em backbone} of the dependence graph. We use a spanning tree likelihood, based on a minimalist graphical model that is purposely overly-simplified. Taking a Bayesian approach, we place a prior on the space of trees and quantify uncertainty in the graphical model. In both theory and experiments, we show that this model does not require the population graph to be a spanning tree or the covariance to satisfy assumptions beyond positive-definiteness. The model accurately recovers the backbone of the population graph at a rate competitive with existing approaches but with better robustness. We show combinatorial properties of the spanning tree, which may be of independent interest, and develop an efficient Gibbs sampler for Bayesian inference. Analyzing electroencephalography data using a Hidden Markov Model with each latent state modeled by a spanning tree, we show that results are much more interpretable compared with popular alternatives.

Download