Approximate Modularity Revisited

86 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Inbal Talgam-Cohen

تاريخ النشر 2016

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Uriel Feige - Michal Feldman - Inbal Talgam-Cohen

بنى وهياكل البيانات والخوارزميات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Set functions with convenient properties (such as submodularity) appear in application areas of current interest, such as algorithmic game theory, and allow for improved optimization algorithms. It is natural to ask (e.g., in the context of data driven optimization) how robust such properties are, and whether small deviations from them can be tolerated. We consider two such questions in the important special case of linear set functions. One question that we address is whether any set function that approximately satisfies the modularity equation (linear functions satisfy the modularity equation exactly) is close to a linear function. The answer to this is positive (in a precise formal sense) as shown by Kalton and Roberts [1983] (and further improved by Bondarenko, Prymak, and Radchenko [2013]). We revisit their proof idea that is based on expander graphs, and provide significantly stronger upper bounds by combining it with new techniques. Furthermore, we provide improved lower bounds for this problem. Another question that we address is that of how to learn a linear function $h$ that is close to an approximately linear function $f$, while querying the value of $f$ on only a small number of sets. We present a deterministic algorithm that makes only linearly many (in the number of items) nonadaptive queries, by this improving over a previous algorithm of Chierichetti, Das, Dasgupta and Kumar [2015] that is randomized and makes more than a quadratic number of queries. Our learning algorithm is based on a Hadamard transform.

قيم البحث

اقرأ أيضاً

Approximate Graph Propagation

76 - Hanzhi Wang , Mingguo He , Zhewei Wei 2021

Efficient computation of node proximity queries such as transition probabilities, Personalized PageRank, and Katz are of fundamental importance in various graph mining and learning tasks. In particular, several recent works leverage fast node proximi ty computation to improve the scalability of Graph Neural Networks (GNN). However, prior studies on proximity computation and GNN feature propagation are on a case-by-case basis, with each paper focusing on a particular proximity measure. In this paper, we propose Approximate Graph Propagation (AGP), a unified randomized algorithm that computes various proximity queries and GNN feature propagation, including transition probabilities, Personalized PageRank, heat kernel PageRank, Katz, SGC, GDC, and APPNP. Our algorithm provides a theoretical bounded error guarantee and runs in almost optimal time complexity. We conduct an extensive experimental study to demonstrate AGPs effectiveness in two concrete applications: local clustering with heat kernel PageRank and node classification with GNNs. Most notably, we present an empirical study on a billion-edge graph Papers100M, the largest publicly available GNN dataset so far. The results show that AGP can significantly improve various existing GNN models scalability without sacrificing prediction accuracy.

بنى وهياكل البيانات والخوارزميات

On approximate equivalence of modularity, D and non-negative matrix factorization

69 - Zhenhai Chang , Hui-Min Cheng , Chao Yan 2018

Community structures detection is one of the fundamental problems in complex network analysis towards understanding the topology structures of the network and the functions of it. Nonnegative matrix factorization (NMF) is a widely used method for com munity detection, and modularity Q and modularity density D are criteria to evaluate the quality of community structures. In this paper, we establish the connections between Q, D and NMF for the first time. Q maximization can be approximately reformulated under the framework of NMF with Frobenius norm, especially when $n$ is large, and D maximization can also be reformulated under the framework of NMF. Q minimization can be reformulated under the framework of NMF with Kullback-Leibler divergence. We propose new methods for community structures detection based on the above findings, and the experimental results on synthetic networks demonstrate their effectiveness.

الشبكات الاجتماعية والمعلومات الفيزياء والمجتمع

Massively Parallel Approximate Distance Sketches

108 - Michael Dinitz , Yasamin Nazari 2018

Data structures that allow efficient distance estimation (distance oracles, distance sketches, etc.) have been extensively studied, and are particularly well studied in centralized models and classical distributed models such as CONGEST. We initiate their study in newer (and arguably more realistic) models of distributed computation: the Congested Clique model and the Massively Parallel Computation (MPC) model. We provide efficient constructions in both of these models, but our core results are for MPC. In MPC we give two main results: an algorithm that constructs stretch/space optimal distance sketches but takes a (small) polynomial number of rounds, and an algorithm that constructs distance sketches with worse stretch but that only takes polylogarithmic rounds. Along the way, we show that other useful combinatorial structures can also be computed in MPC. In particular, one key component we use to construct distance sketches are an MPC construction of the hopsets of Elkin and Neiman (2016). This result has additional applications such as the first polylogarithmic time algorithm for constant approximate single-source shortest paths for weighted graphs in the low memory MPC setting.

بنى وهياكل البيانات والخوارزميات النظم الموزعة والتوازية والحوسبة العنقودية

Approximate Aggregate Queries Under Additive Inequalities

87 - Mahmoud Abo-Khamis , Sungjin Im , Benjamin Moseley 2020

We consider the problem of evaluating certain types of functional aggregation queries on relational data subject to additive inequalities. Such aggregation queries, with a smallish number of additive inequalities, arise naturally/commonly in many app lications, particularly in learning applications. We give a relatively complete categorization of the computational complexity of such problems. We first show that the problem is NP-hard, even in the case of one additive inequality. Thus we turn to approximating the query. Our main result is an efficient algorithm for approximating, with arbitrarily small relative error, many natural aggregation queries with one additive inequality. We give examples of natural queries that can be efficiently solved using this algorithm. In contrast, we show that the situation with two additive inequalities is quite different, by showing that it is NP-hard to evaluate simple aggregation queries, with two additive inequalities, with any bounded relative error.

بنى وهياكل البيانات والخوارزميات قواعد البيانات التعلم الآلي

Approximate Hamming distance in a stream

82 - Raphael Clifford , Tatiana Starikovskaya 2016

We consider the problem of computing a $(1+epsilon)$-approximation of the Hamming distance between a pattern of length $n$ and successive substrings of a stream. We first look at the one-way randomised communication complexity of this problem, giving Alice the first half of the stream and Bob the second half. We show the following: (1) If Alice and Bob both share the pattern then there is an $O(epsilon^{-4} log^2 n)$ bit randomised one-way communication protocol. (2) If only Alice has the pattern then there is an $O(epsilon^{-2}sqrt{n}log n)$ bit randomised one-way communication protocol. We then go on to develop small space streaming algorithms for $(1+epsilon)$-approximate Hamming distance which give worst case running time guarantees per arriving symbol. (1) For binary input alphabets there is an $O(epsilon^{-3} sqrt{n} log^{2} n)$ space and $O(epsilon^{-2} log{n})$ time streaming $(1+epsilon)$-approximate Hamming distance algorithm. (2) For general input alphabets there is an $O(epsilon^{-5} sqrt{n} log^{4} n)$ space and $O(epsilon^{-4} log^3 {n})$ time streaming $(1+epsilon)$-approximate Hamming distance algorithm.

بنى وهياكل البيانات والخوارزميات

سجل دخول لتتمكن من نشر تعليقات