Research papers, master and doctoral theses about Data Structures and Algorithms

Generating Random Networks Without Short Cycles

175 - Mohsen Bayati , Andrea Montanari , Amin Saberi 2017

Random graph generation is an important tool for studying large complex networks. Despite abundance of random graph models, constructing models with application-driven constraints is poorly understood. In order to advance state-of-the-art in this area, we focus on random graphs without short cycles as a stylized family of graphs, and propose the RandGraph algorithm for randomly generating them. For any constant k, when m=O(n^{1+1/[2k(k+3)]}), RandGraph generates an asymptotically uniform random graph with n vertices, m edges, and no cycle of length at most k using O(n^2m) operations. We also characterize the approximation error for finite values of n. To the best of our knowledge, this is the first polynomial-time algorithm for the problem. RandGraph works by sequentially adding $m$ edges to an empty graph with n vertices. Recently, such sequential algorithms have been successful for random sampling problems. Our main contributions to this line of research includes introducing a new approach for sequentially approximating edge-specific probabilities at each step of the algorithm, and providing a new method for analyzing such algorithms.

Data Structures and Algorithms Information Theory Information Theory

Efficient Optimally Lazy Algorithms for Minimal-Interval Semantics

360 - Sebastiano Vigna , Paolo Boldi 2016

Minimal-interval semantics associates with each query over a document a set of intervals, called witnesses, that are incomparable with respect to inclusion (i.e., they form an antichain): witnesses define the minimal regions of the document satisfying the query. Minimal-interval semantics makes it easy to define and compute several sophisticated proximity operators, provides snippets for user presentation, and can be used to rank documents. In this paper we provide algorithms for computing conjunction and disjunction that are linear in the number of intervals and logarithmic in the number of operands; for additional operators, such as ordered conjunction and Brouwerian difference, we provide linear algorithms. In all cases, space is linear in the number of operands. More importantly, we define a formal notion of optimal laziness, and either prove it, or prove its impossibility, for each algorithm. We cast our results in a general framework of antichains of intervals on total orders, making our algorithms directly applicable to other domains.

Data Structures and Algorithms Information Retrieval

The Ratio Index for Budgeted Learning, with Applications

639 - Ashish Goel , Sanjeev Khanna , Brad Null 2016

In the budgeted learning problem, we are allowed to experiment on a set of alternatives (given a fixed experimentation budget) with the goal of picking a single alternative with the largest possible expected payoff. Approximation algorithms for this problem were developed by Guha and Munagala by rounding a linear program that couples the various alternatives together. In this paper we present an index for this problem, which we call the ratio index, which also guarantees a constant factor approximation. Index-based policies have the advantage that a single number (i.e. the index) can be computed for each alternative irrespective of all other alternatives, and the alternative with the highest index is experimented upon. This is analogous to the famous Gittins index for the discounted multi-armed bandit problem. The ratio index has several interesting structural properties. First, we show that it can be computed in strongly polynomial time. Second, we show that with the appropriate discount factor, the Gittins index and our ratio index are constant factor approximations of each other, and hence the Gittins index also gives a constant factor approximation to the budgeted learning problem. Finally, we show that the ratio index can be used to create an index-based policy that achieves an O(1)-approximation for the finite horizon version of the multi-armed bandit problem. Moreover, the policy does not require any knowledge of the horizon (whereas we compare its performance against an optimal strategy that is aware of the horizon). This yields the following surprising result: there is an index-based policy that achieves an O(1)-approximation for the multi-armed bandit problem, oblivious to the underlying discount factor.

Data Structures and Algorithms

The Minimum Backlog Problem

436 - Michael A. Bender , Sandor P. Fekete , Alexander Kroller 2016

We study the minimum backlog problem (MBP). This online problem arises, e.g., in the context of sensor networks. We focus on two main variants of MBP. The discrete MBP is a 2-person game played on a graph $G=(V,E)$. The player is initially located at a vertex of the graph. In each time step, the adversary pours a total of one unit of water into cups that are located on the vertices of the graph, arbitrarily distributing the water among the cups. The player then moves from her current vertex to an adjacent vertex and empties the cup at that vertex. The players objective is to minimize the backlog, i.e., the maximum amount of water in any cup at any time. The geometric MBP is a continuous-time version of the MBP: the cups are points in the two-dimensional plane, the adversary pours water continuously at a constant rate, and the player moves in the plane with unit speed. Again, the players objective is to minimize the backlog. We show that the competitive ratio of any algorithm for the MBP has a lower bound of $Omega(D)$, where $D$ is the diameter of the graph (for the discrete MBP) or the diameter of the point set (for the geometric MBP). Therefore we focus on determining a strategy for the player that guarantees a uniform upper bound on the absolute value of the backlog. For the absolute value of the backlog there is a trivial lower bound of $Omega(D)$, and the deamortization analysis of Dietz and Sleator gives an upper bound of $O(Dlog N)$ for $N$ cups. Our main result is a tight upper bound for the geometric MBP: we show that there is a strategy for the player that guarantees a backlog of $O(D)$, independently of the number of cups.

Data Structures and Algorithms

A Deterministic Algorithm for Maximizing Submodular Functions

136 - Shahar Dobzinski , Ami Mor 2015

The problem of maximizing a non-negative submodular function was introduced by Feige, Mirrokni, and Vondrak [FOCS07] who provided a deterministic local-search based algorithm that guarantees an approximation ratio of $frac 1 3$, as well as a randomized $frac 2 5$-approximation algorithm. An extensive line of research followed and various algorithms with improving approximation ratios were developed, all of them are randomized. Finally, Buchbinder et al. [FOCS12] presented a randomized $frac 1 2$-approximation algorithm, which is the best possible. This paper gives the first deterministic algorithm for maximizing a non-negative submodular function that achieves an approximation ratio better than $frac 1 3$. The approximation ratio of our algorithm is $frac 2 5$. Our algorithm is based on recursive composition of solutions obtained by the local search algorithm of Feige et al. We show that the $frac 2 5$ approximation ratio can be guaranteed when the recursion depth is $2$, and leave open the question of whether the approximation ratio improves as the recursion depth increases.

Data Structures and Algorithms

Perturbed Iterate Analysis for Asynchronous Stochastic Optimization

211 - Horia Mania , Xinghao Pan , Dimitris Papailiopoulos 2015

We introduce and analyze stochastic optimization methods where the input to each gradient update is perturbed by bounded noise. We show that this framework forms the basis of a unified approach to analyze asynchronous implementations of stochastic optimization algorithms.In this framework, asynchronous stochastic optimization algorithms can be thought of as serial methods operating on noisy inputs. Using our perturbed iterate framework, we provide new analyses of the Hogwild! algorithm and asynchronous stochastic coordinate descent, that are simpler than earlier analyses, remove many assumptions of previous models, and in some cases yield improved upper bounds on the convergence rates. We proceed to apply our framework to develop and analyze KroMagnon: a novel, parallel, sparse stochastic variance-reduced gradient (SVRG) algorithm. We demonstrate experimentally on a 16-core machine that the sparse and parallel version of SVRG is in some cases more than four orders of magnitude faster than the standard SVRG algorithm.

Machine Learning Distributed Parallel and Cluster Computing Data Structures and Algorithms

On the Worst-Case Approximability of Sparse PCA

113 - Siu On Chan , Dimitris Papailiopoulos , Aviad Rubinstein 2015

It is well known that Sparse PCA (Sparse Principal Component Analysis) is NP-hard to solve exactly on worst-case instances. What is the complexity of solving Sparse PCA approximately? Our contributions include: 1) a simple and efficient algorithm that achieves an $n^{-1/3}$-approximation; 2) NP-hardness of approximation to within $(1-varepsilon)$, for some small constant $varepsilon > 0$; 3) SSE-hardness of approximation to within any constant factor; and 4) an $expexpleft(Omegaleft(sqrt{log log n}right)right)$ (quasi-quasi-polynomial) gap for the standard semidefinite program.

Machine Learning Computational Complexity Data Structures and Algorithms

Parallel Correlation Clustering on Big Graphs

114 - Xinghao Pan , Dimitris Papailiopoulos , Samet Oymak 2015

Given a similarity graph between items, correlation clustering (CC) groups similar items together and dissimilar ones apart. One of the most popular CC algorithms is KwikCluster: an algorithm that serially clusters neighborhoods of vertices, and obtains a 3-approximation ratio. Unfortunately, KwikCluster in practice requires a large number of clustering rounds, a potential bottleneck for large graphs. We present C4 and ClusterWild!, two algorithms for parallel correlation clustering that run in a polylogarithmic number of rounds and achieve nearly linear speedups, provably. C4 uses concurrency control to enforce serializability of a parallel clustering process, and guarantees a 3-approximation ratio. ClusterWild! is a coordination free algorithm that abandons consistency for the benefit of better scaling; this leads to a provably small loss in the 3-approximation ratio. We provide extensive experimental results for both algorithms, where we outperform the state of the art, both in terms of clustering accuracy and running time. We show that our algorithms can cluster billion-edge graphs in under 5 seconds on 32 cores, while achieving a 15x speedup.

Distributed Parallel and Cluster Computing Data Structures and Algorithms Machine Learning

Extension Complexity, MSO Logic, and Treewidth

433 - Petr Kolman , Martin Koutecky , Hans Raj Tiwary 2015

We consider the convex hull $P_{varphi}(G)$ of all satisfying assignments of a given MSO formula $varphi$ on a given graph $G$. We show that there exists an extended formulation of the polytope $P_{varphi}(G)$ that can be described by $f(|varphi|,tau)cdot n$ inequalities, where $n$ is the number of vertices in $G$, $tau$ is the treewidth of $G$ and $f$ is a computable function depending only on $varphi$ and $tau.$ In other words, we prove that the extension complexity of $P_{varphi}(G)$ is linear in the size of the graph $G$, with a constant depending on the treewidth of $G$ and the formula $varphi$. This provides a very general yet very simple meta-theorem about the extension complexity of polytopes related to a wide class of problems and graphs. As a corollary of our main result, we obtain an analogous result % for the weaker MSO$_1$ logic on the wider class of graphs of bounded cliquewidth. Furthermore, we study our main geometric tool which we term the glued product of polytopes. While the glued product of polytopes has been known since the 90s, we are the first to show that it preserves decomposability and boundedness of treewidth of the constraint matrix. This implies that our extension of $P_varphi(G)$ is decomposable and has a constraint matrix of bounded treewidth; so far only few classes of polytopes are known to be decomposable. These properties make our extension useful in the construction of algorithms.

Data Structures and Algorithms Computational Complexity

Tensor principal component analysis via sum-of-squares proofs

213 - Samuel B. Hopkins , Jonathan Shi , David Steurer 2015

We study a statistical model for the tensor principal component analysis problem introduced by Montanari and Richard: Given a order-$3$ tensor $T$ of the form $T = tau cdot v_0^{otimes 3} + A$, where $tau geq 0$ is a signal-to-noise ratio, $v_0$ is a unit vector, and $A$ is a random noise tensor, the goal is to recover the planted vector $v_0$. For the case that $A$ has iid standard Gaussian entries, we give an efficient algorithm to recover $v_0$ whenever $tau geq omega(n^{3/4} log(n)^{1/4})$, and certify that the recovered vector is close to a maximum likelihood estimator, all with high probability over the random choice of $A$. The previous best algorithms with provable guarantees required $tau geq Omega(n)$. In the regime $tau leq o(n)$, natural tensor-unfolding-based spectral relaxations for the underlying optimization problem break down (in the sense that their integrality gap is large). To go beyond this barrier, we use convex relaxations based on the sum-of-squares method. Our recovery algorithm proceeds by rounding a degree-$4$ sum-of-squares relaxations of the maximum-likelihood-estimation problem for the statistical model. To complement our algorithmic results, we show that degree-$4$ sum-of-squares relaxations break down for $tau leq O(n^{3/4}/log(n)^{1/4})$, which demonstrates that improving our current guarantees (by more than logarithmic factors) would require new techniques or might even be intractable. Finally, we show how to exploit additional problem structure in order to solve our sum-of-squares relaxations, up to some approximation, very efficiently. Our fastest algorithm runs in nearly-linear time using shifted (matrix) power iteration and has similar guarantees as above. The analysis of this algorithm also confirms a variant of a conjecture of Montanari and Richard about singular vectors of tensor unfoldings.

Machine Learning Computational Complexity Data Structures and Algorithms

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد