No Arabic abstract
The spectral gap of a Markov chain can be bounded by the spectral gaps of constituent restriction chains and a projection chain, and the strength of such a bound is the content of various decomposition theorems. In this paper, we introduce a new parameter that allows us to improve upon these bounds. We further define a notion of orthogonality between the restriction chains and complementary restriction chains. This leads to a new Complementary Decomposition theorem, which does not require analyzing the projection chain. For $epsilon$-orthogonal chains, this theorem may be iterated $O(1/epsilon)$ times while only giving away a constant multiplicative factor on the overall spectral gap. As an application, we provide a $1/n$-orthogonal decomposition of the nearest neighbor Markov chain over $k$-class biased monotone permutations on [$n$], as long as the number of particles in each class is at least $Clog n$. This allows us to apply the Complementary Decomposition theorem iteratively $n$ times to prove the first polynomial bound on the spectral gap when $k$ is as large as $Theta(n/log n)$. The previous best known bound assumed $k$ was at most a constant.
We present a new lower bound on the spectral gap of the Glauber dynamics for the Gibbs distribution of a spectrally independent $q$-spin system on a graph $G = (V,E)$ with maximum degree $Delta$. Notably, for several interesting examples, our bound covers the entire regime of $Delta$ excluded by arguments based on coupling with the stationary distribution. As concrete applications, by combining our new lower bound with known spectral independence computations and known coupling arguments: (1) We show that for a triangle-free graph $G = (V,E)$ with maximum degree $Delta geq 3$, the Glauber dynamics for the uniform distribution on proper $k$-colorings with $k geq (1.763dots + delta)Delta$ colors has spectral gap $tilde{Omega}_{delta}(|V|^{-1})$. Previously, such a result was known either if the girth of $G$ is at least $5$ [Dyer et.~al, FOCS 2004], or under restrictions on $Delta$ [Chen et.~al, STOC 2021; Hayes-Vigoda, FOCS 2003]. (2) We show that for a regular graph $G = (V,E)$ with degree $Delta geq 3$ and girth at least $6$, and for any $varepsilon, delta > 0$, the partition function of the hardcore model with fugacity $lambda leq (1-delta)lambda_{c}(Delta)$ may be approximated within a $(1+varepsilon)$-multiplicative factor in time $tilde{O}_{delta}(n^{2}varepsilon^{-2})$. Previously, such a result was known if the girth is at least $7$ [Efthymiou et.~al, SICOMP 2019]. (3) We show for the binomial random graph $G(n,d/n)$ with $d = O(1)$, with high probability, an approximately uniformly random matching may be sampled in time $O_{d}(n^{2+o(1)})$. This improves the corresponding running time of $tilde{O}_{d}(n^{3})$ due to [Jerrum-Sinclair, SICOMP 1989; Jerrum, 2003].
In this paper we introduce a notion of spectral approximation for directed graphs. While there are many potential ways one might define approximation for directed graphs, most of them are too strong to allow sparse approximations in general. In contrast, we prove that for our notion of approximation, such sparsifiers do exist, and we show how to compute them in almost linear time. Using this notion of approximation, we provide a general framework for solving asymmetric linear systems that is broadly inspired by the work of [Peng-Spielman, STOC`14]. Applying this framework in conjunction with our sparsification algorithm, we obtain an almost linear time algorithm for solving directed Laplacian systems associated with Eulerian Graphs. Using this solver in the recent framework of [Cohen-Kelner-Peebles-Peng-Sidford-Vladu, FOCS`16], we obtain almost linear time algorithms for solving a directed Laplacian linear system, computing the stationary distribution of a Markov chain, computing expected commute times in a directed graph, and more. For each of these problems, our algorithms improves the previous best running times of $O((nm^{3/4} + n^{2/3} m) log^{O(1)} (n kappa epsilon^{-1}))$ to $O((m + n2^{O(sqrt{log{n}loglog{n}})}) log^{O(1)} (n kappa epsilon^{-1}))$ where $n$ is the number of vertices in the graph, $m$ is the number of edges, $kappa$ is a natural condition number associated with the problem, and $epsilon$ is the desired accuracy. We hope these results open the door for further studies into directed spectral graph theory, and will serve as a stepping stone for designing a new generation of fast algorithms for directed graphs.
We study the following learning problem with dependent data: Observing a trajectory of length $n$ from a stationary Markov chain with $k$ states, the goal is to predict the next state. For $3 leq k leq O(sqrt{n})$, using techniques from universal compression, the optimal prediction risk in Kullback-Leibler divergence is shown to be $Theta(frac{k^2}{n}log frac{n}{k^2})$, in contrast to the optimal rate of $Theta(frac{log log n}{n})$ for $k=2$ previously shown in Falahatgar et al., 2016. These rates, slower than the parametric rate of $O(frac{k^2}{n})$, can be attributed to the memory in the data, as the spectral gap of the Markov chain can be arbitrarily small. To quantify the memory effect, we study irreducible reversible chains with a prescribed spectral gap. In addition to characterizing the optimal prediction risk for two states, we show that, as long as the spectral gap is not excessively small, the prediction risk in the Markov model is $O(frac{k^2}{n})$, which coincides with that of an iid model with the same number of parameters.
The spectral gap $gamma$ of an ergodic and reversible Markov chain is an important parameter measuring the asymptotic rate of convergence. In applications, the transition matrix $P$ may be unknown, yet one sample of the chain up to a fixed time $t$ may be observed. Hsu, Kontorovich, and Szepesvari (2015) considered the problem of estimating $gamma$ from this data. Let $pi$ be the stationary distribution of $P$, and $pi_star = min_x pi(x)$. They showed that, if $t = tilde{O}bigl(frac{1}{gamma^3 pi_star}bigr)$, then $gamma$ can be estimated to within multiplicative constants with high probability. They also proved that $tilde{Omega}bigl(frac{n}{gamma}bigr)$ steps are required for precise estimation of $gamma$. We show that $tilde{O}bigl(frac{1}{gamma pi_star}bigr)$ steps of the chain suffice to estimate $gamma$ up to multiplicative constants with high probability. When $pi$ is uniform, this matches (up to logarithmic corrections) the lower bound of Hsu, Kontorovich, and Szepesvari.
Convergence rates of Markov chains have been widely studied in recent years. In particular, quantitative bounds on convergence rates have been studied in various forms by Meyn and Tweedie [Ann. Appl. Probab. 4 (1994) 981-1101], Rosenthal [J. Amer. Statist. Assoc. 90 (1995) 558-566], Roberts and Tweedie [Stochastic Process. Appl. 80 (1999) 211-229], Jones and Hobert [Statist. Sci. 16 (2001) 312-334] and Fort [Ph.D. thesis (2001) Univ. Paris VI]. In this paper, we extend a result of Rosenthal [J. Amer. Statist. Assoc. 90 (1995) 558-566] that concerns quantitative convergence rates for time-homogeneous Markov chains. Our extension allows us to consider f-total variation distance (instead of total variation) and time-inhomogeneous Markov chains. We apply our results to simulated annealing.