Sorted Top-k in Rounds

63 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jieming Mao

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Mark Braverman - Jieming Mao - Yuval Peres

بنى وهياكل البيانات والخوارزميات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We consider the sorted top-$k$ problem whose goal is to recover the top-$k$ items with the correct order out of $n$ items using pairwise comparisons. In many applications, multiple rounds of interaction can be costly. We restrict our attention to algorithms with a constant number of rounds $r$ and try to minimize the sample complexity, i.e. the number of comparisons. When the comparisons are noiseless, we characterize how the optimal sample complexity depends on the number of rounds (up to a polylogarithmic factor for general $r$ and up to a constant factor for $r=1$ or 2). In particular, the sample complexity is $Theta(n^2)$ for $r=1$, $Theta(nsqrt{k} + n^{4/3})$ for $r=2$ and $tilde{Theta}left(n^{2/r} k^{(r-1)/r} + nright)$ for $r geq 3$. We extend our results of sorted top-$k$ to the noisy case where each comparison is correct with probability $2/3$. When $r=1$ or 2, we show that the sample complexity gets an extra $Theta(log(k))$ factor when we transition from the noiseless case to the noisy case. We also prove new results for top-$k$ and sorting in the noisy case. We believe our techniques can be generally useful for understanding the trade-off between round complexities and sample complexities of rank aggregation problems.

قيم البحث

192 - Riccardo Dondi , Pietro Hiram Guzzi , 2020

Networks are largely used for modelling and analysing data and relations among them. Recently, it has been shown that the use of a single network may not be the optimal choice, since a single network may misses some aspects. Consequently, it has been proposed to use a pair of networks to better model all the aspects, and the main approach is referred to as dual networks (DNs). DNs are two related graphs (one weighted, the other unweighted) that share the same set of vertices and two different edge sets. In DNs is often interesting to extract common subgraphs among the two networks that are maximally dense in the conceptual network and connected in the physical one. The simplest instance of this problem is finding a common densest connected subgraph (DCS), while we here focus on the detection of the Top-k Densest Connected subgraphs, i.e. a set k subgraphs having the largest density in the conceptual network which are also connected in the physical network. We formalise the problem and then we propose a heuristic to find a solution, since the problem is computationally hard. A set of experiments on synthetic and real networks is also presented to support our approach.

بنى وهياكل البيانات والخوارزميات

Parallel Graph Connectivity in Log Diameter Rounds

202 - Alexandr Andoni , Clifford Stein , Zhao Song 2018

We study graph connectivity problem in MPC model. On an undirected graph with $n$ nodes and $m$ edges, $O(log n)$ round connectivity algorithms have been known for over 35 years. However, no algorithms with better complexity bounds were known. In thi s work, we give fully scalable, faster algorithms for the connectivity problem, by parameterizing the time complexity as a function of the diameter of the graph. Our main result is a $O(log D loglog_{m/n} n)$ time connectivity algorithm for diameter-$D$ graphs, using $Theta(m)$ total memory. If our algorithm can use more memory, it can terminate in fewer rounds, and there is no lower bound on the memory per processor. We extend our results to related graph problems such as spanning forest, finding a DFS sequence, exact/approximate minimum spanning forest, and bottleneck spanning forest. We also show that achieving similar bounds for reachability in directed graphs would imply faster boolean matrix multiplication algorithms. We introduce several new algorithmic ideas. We describe a general technique called double exponential speed problem size reduction which roughly means that if we can use total memory $N$ to reduce a problem from size $n$ to $n/k$, for $k=(N/n)^{Theta(1)}$ in one phase, then we can solve the problem in $O(loglog_{N/n} n)$ phases. In order to achieve this fast reduction for graph connectivity, we use a multistep algorithm. One key step is a carefully constructed truncated broadcasting scheme where each node broadcasts neighbor sets to its neighbors in a way that limits the size of the resulting neighbor sets. Another key step is random leader contraction, where we choose a smaller set of leaders than many previous works do.

بنى وهياكل البيانات والخوارزميات النظم الموزعة والتوازية والحوسبة العنقودية

Correlation Clustering in Constant Many Parallel Rounds

90 - Vincent Cohen-Addad , Silvio Lattanzi , Slobodan Mitrovic 2021

Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining. In correlation clustering, one receives as input a signed graph and the goal is to partition it to minimize the number of disagreements. In this work we propose a massively parallel computation (MPC) algorithm for this problem that is considerably faster than prior work. In particular, our algorithm uses machines with memory sublinear in the number of nodes in the graph and returns a constant approximation while running only for a constant number of rounds. To the best of our knowledge, our algorithm is the first that can provably approximate a clustering problem on graphs using only a constant number of MPC rounds in the sublinear memory regime. We complement our analysis with an experimental analysis of our techniques.

بنى وهياكل البيانات والخوارزميات النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

Spatio-Temporal Top-k Similarity Search for Trajectories in Graphs

85 - Lutz Oettershagen , Anne Driemel , Petra Mutzel 2020

We study the problem of finding the $k$ most similar trajectories to a given query trajectory. Our work is inspired by the work of Grossi et al. [6] that considers trajectories as walks in a graph. Each visited vertex is accompanied by a time-interva l. Grossi et al. define a similarity function that captures temporal and spatial aspects. We improve this similarity function to derive a new spatio-temporal distance function for which we can show that a specific type of triangle inequality is satisfied. This distance function is the basis for our index structures, which can be constructed efficiently, need only linear memory, and can quickly answer queries for the top-$k$ most similar trajectories. Our evaluation on real-world and synthetic data sets shows that our algorithms outperform the baselines with respect to indexing time by several orders of magnitude while achieving similar or better query time and quality of results.

بنى وهياكل البيانات والخوارزميات

Top-k Overlapping Densest Subgraphs: Approximation and Complexity

179 - Riccardo Dondi , Mohammad Mehdi Hosseinzadeh , Giancarlo Mauri 2018

A central problem in graph mining is finding dense subgraphs, with several applications in different fields, a notable example being identifying communities. While a lot of effort has been put on the problem of finding a single dense subgraph, only r ecently the focus has been shifted to the problem of finding a set of densest subgraphs. Some approaches aim at finding disjoint subgraphs, while in many real-world networks communities are often overlapping. An approach introduced to find possible overlapping subgraphs is the Top-k Overlapping Densest Subgraphs problem. For a given integer k >= 1, the goal of this problem is to find a set of k densest subgraphs that may share some vertices. The objective function to be maximized takes into account both the density of the subgraphs and the distance between subgraphs in the solution. The Top-k Overlapping Densest Subgraphs problem has been shown to admit a 1/10-factor approximation algorithm. Furthermore, the computational complexity of the problem has been left open. In this paper, we present contributions concerning the approximability and the computational complexity of the problem. For the approximability, we present approximation algorithms that improves the approximation factor to 1/2 , when k is bounded by the vertex set, and to 2/3 when k is a constant. For the computational complexity, we show that the problem is NP-hard even when k = 3.

بنى وهياكل البيانات والخوارزميات التعقيد الحسابي الشبكات الاجتماعية والمعلومات