No Arabic abstract
Motivated by storage applications, we study the following data structure problem: An encoder wishes to store a collection of jointly-distributed files $overline{X}:=(X_1,X_2,ldots, X_n) sim mu$ which are emph{correlated} ($H_mu(overline{X}) ll sum_i H_mu(X_i)$), using as little (expected) memory as possible, such that each individual file $X_i$ can be recovered quickly with few (ideally constant) memory accesses. In the case of independent random files, a dramatic result by Pat (FOCS08) and subsequently by Dodis, Pat and Thorup (STOC10) shows that it is possible to store $overline{X}$ using just a emph{constant} number of extra bits beyond the information-theoretic minimum space, while at the same time decoding each $X_i$ in constant time. However, in the (realistic) case where the files are correlated, much weaker results are known, requiring at least $Omega(n/polylg n)$ extra bits for constant decoding time, even for simple joint distributions $mu$. We focus on the natural case of compressingemph{Markov chains}, i.e., storing a length-$n$ random walk on any (possibly directed) graph $G$. Denoting by $kappa(G,n)$ the number of length-$n$ walks on $G$, we show that there is a succinct data structure storing a random walk using $lg_2 kappa(G,n) + O(lg n)$ bits of space, such that any vertex along the walk can be decoded in $O(1)$ time on a word-RAM. For the harder task of matching the emph{point-wise} optimal space of the walk, i.e., the empirical entropy $sum_{i=1}^{n-1} lg (deg(v_i))$, we present a data structure with $O(1)$ extra bits at the price of $O(lg n)$ decoding time, and show that any improvement on this would lead to an improved solution on the long-standing Dictionary problem. All of our data structures support the emph{online} version of the problem with constant update and query time.
In this note, we design a discrete random walk on the real line which takes steps $0, pm 1$ (and one with steps in ${pm 1, 2}$) where at least $96%$ of the signs are $pm 1$ in expectation, and which has $mathcal{N}(0,1)$ as a stationary distribution. As an immediate corollary, we obtain an online version of Banaszczyks discrepancy result for partial colorings and $pm 1, 2$ signings. Additionally, we recover linear time algorithms for logarithmic bounds for the Koml{o}s conjecture in an oblivious online setting.
Random graph generation is an important tool for studying large complex networks. Despite abundance of random graph models, constructing models with application-driven constraints is poorly understood. In order to advance state-of-the-art in this area, we focus on random graphs without short cycles as a stylized family of graphs, and propose the RandGraph algorithm for randomly generating them. For any constant k, when m=O(n^{1+1/[2k(k+3)]}), RandGraph generates an asymptotically uniform random graph with n vertices, m edges, and no cycle of length at most k using O(n^2m) operations. We also characterize the approximation error for finite values of n. To the best of our knowledge, this is the first polynomial-time algorithm for the problem. RandGraph works by sequentially adding $m$ edges to an empty graph with n vertices. Recently, such sequential algorithms have been successful for random sampling problems. Our main contributions to this line of research includes introducing a new approach for sequentially approximating edge-specific probabilities at each step of the algorithm, and providing a new method for analyzing such algorithms.
In this paper, we study the average case complexity of the Unique Games problem. We propose a natural semi-random model, in which a unique game instance is generated in several steps. First an adversary selects a completely satisfiable instance of Unique Games, then she chooses an epsilon-fraction of all edges, and finally replaces (corrupts) the constraints corresponding to these edges with new constraints. If all steps are adversarial, the adversary can obtain any (1-epsilon) satisfiable instance, so then the problem is as hard as in the worst case. In our semi-random model, one of the steps is random, and all other steps are adversarial. We show that known algorithms for unique games (in particular, all algorithms that use the standard SDP relaxation) fail to solve semi-random instances of Unique Games. We present an algorithm that with high probability finds a solution satisfying a (1-delta) fraction of all constraints in semi-random instances (we require that the average degree of the graph is Omega(log k). To this end, we consider a new non-standard SDP program for Unique Games, which is not a relaxation for the problem, and show how to analyze it. We present a new rounding scheme that simultaneously uses SDP and LP solutions, which we believe is of independent interest. Our result holds only for epsilon less than some absolute constant. We prove that if epsilon > 1/2, then the problem is hard in one of the models, the result assumes the 2-to-2 conjecture. Finally, we study semi-random instances of Unique Games that are at most (1-epsilon) satisfiable. We present an algorithm that with high probability, distinguishes between the case when the instance is a semi-random instance and the case when the instance is an (arbitrary) (1-delta) satisfiable instance if epsilon > c delta.
We consider a fundamental algorithmic question in spectral graph theory: Compute a spectral sparsifier of random-walk matrix-polynomial $$L_alpha(G)=D-sum_{r=1}^dalpha_rD(D^{-1}A)^r$$ where $A$ is the adjacency matrix of a weighted, undirected graph, $D$ is the diagonal matrix of weighted degrees, and $alpha=(alpha_1...alpha_d)$ are nonnegative coefficients with $sum_{r=1}^dalpha_r=1$. Recall that $D^{-1}A$ is the transition matrix of random walks on the graph. The sparsification of $L_alpha(G)$ appears to be algorithmically challenging as the matrix power $(D^{-1}A)^r$ is defined by all paths of length $r$, whose precise calculation would be prohibitively expensive. In this paper, we develop the first nearly linear time algorithm for this sparsification problem: For any $G$ with $n$ vertices and $m$ edges, $d$ coefficients $alpha$, and $epsilon > 0$, our algorithm runs in time $O(d^2mlog^2n/epsilon^{2})$ to construct a Laplacian matrix $tilde{L}=D-tilde{A}$ with $O(nlog n/epsilon^{2})$ non-zeros such that $tilde{L}approx_{epsilon}L_alpha(G)$. Matrix polynomials arise in mathematical analysis of matrix functions as well as numerical solutions of matrix equations. Our work is particularly motivated by the algorithmic problems for speeding up the classic Newtons method in applications such as computing the inverse square-root of the precision matrix of a Gaussian random field, as well as computing the $q$th-root transition (for $qgeq1$) in a time-reversible Markov model. The key algorithmic step for both applications is the construction of a spectral sparsifier of a constant degree random-walk matrix-polynomials introduced by Newtons method. Our algorithm can also be used to build efficient data structures for effective resistances for multi-step time-reversible Markov models, and we anticipate that it could be useful for other tasks in network analysis.
We prove distributional limit theorems for the length of the largest convex minorant of a one-dimensional random walk with independent identically distributed increments. Depending on the increment law, there are several regimes with different limit distributions for this length. Among other tools, a representation of the convex minorant of a random walk in terms of uniform random permutations is utilized.