ترغب بنشر مسار تعليمي؟ اضغط هنا

Approximate Modularity Revisited

86   0   0.0 ( 0 )
 نشر من قبل Inbal Talgam-Cohen
 تاريخ النشر 2016
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Set functions with convenient properties (such as submodularity) appear in application areas of current interest, such as algorithmic game theory, and allow for improved optimization algorithms. It is natural to ask (e.g., in the context of data driven optimization) how robust such properties are, and whether small deviations from them can be tolerated. We consider two such questions in the important special case of linear set functions. One question that we address is whether any set function that approximately satisfies the modularity equation (linear functions satisfy the modularity equation exactly) is close to a linear function. The answer to this is positive (in a precise formal sense) as shown by Kalton and Roberts [1983] (and further improved by Bondarenko, Prymak, and Radchenko [2013]). We revisit their proof idea that is based on expander graphs, and provide significantly stronger upper bounds by combining it with new techniques. Furthermore, we provide improved lower bounds for this problem. Another question that we address is that of how to learn a linear function $h$ that is close to an approximately linear function $f$, while querying the value of $f$ on only a small number of sets. We present a deterministic algorithm that makes only linearly many (in the number of items) nonadaptive queries, by this improving over a previous algorithm of Chierichetti, Das, Dasgupta and Kumar [2015] that is randomized and makes more than a quadratic number of queries. Our learning algorithm is based on a Hadamard transform.



قيم البحث

اقرأ أيضاً

Efficient computation of node proximity queries such as transition probabilities, Personalized PageRank, and Katz are of fundamental importance in various graph mining and learning tasks. In particular, several recent works leverage fast node proximi ty computation to improve the scalability of Graph Neural Networks (GNN). However, prior studies on proximity computation and GNN feature propagation are on a case-by-case basis, with each paper focusing on a particular proximity measure. In this paper, we propose Approximate Graph Propagation (AGP), a unified randomized algorithm that computes various proximity queries and GNN feature propagation, including transition probabilities, Personalized PageRank, heat kernel PageRank, Katz, SGC, GDC, and APPNP. Our algorithm provides a theoretical bounded error guarantee and runs in almost optimal time complexity. We conduct an extensive experimental study to demonstrate AGPs effectiveness in two concrete applications: local clustering with heat kernel PageRank and node classification with GNNs. Most notably, we present an empirical study on a billion-edge graph Papers100M, the largest publicly available GNN dataset so far. The results show that AGP can significantly improve various existing GNN models scalability without sacrificing prediction accuracy.
Community structures detection is one of the fundamental problems in complex network analysis towards understanding the topology structures of the network and the functions of it. Nonnegative matrix factorization (NMF) is a widely used method for com munity detection, and modularity Q and modularity density D are criteria to evaluate the quality of community structures. In this paper, we establish the connections between Q, D and NMF for the first time. Q maximization can be approximately reformulated under the framework of NMF with Frobenius norm, especially when $n$ is large, and D maximization can also be reformulated under the framework of NMF. Q minimization can be reformulated under the framework of NMF with Kullback-Leibler divergence. We propose new methods for community structures detection based on the above findings, and the experimental results on synthetic networks demonstrate their effectiveness.
Data structures that allow efficient distance estimation (distance oracles, distance sketches, etc.) have been extensively studied, and are particularly well studied in centralized models and classical distributed models such as CONGEST. We initiate their study in newer (and arguably more realistic) models of distributed computation: the Congested Clique model and the Massively Parallel Computation (MPC) model. We provide efficient constructions in both of these models, but our core results are for MPC. In MPC we give two main results: an algorithm that constructs stretch/space optimal distance sketches but takes a (small) polynomial number of rounds, and an algorithm that constructs distance sketches with worse stretch but that only takes polylogarithmic rounds. Along the way, we show that other useful combinatorial structures can also be computed in MPC. In particular, one key component we use to construct distance sketches are an MPC construction of the hopsets of Elkin and Neiman (2016). This result has additional applications such as the first polylogarithmic time algorithm for constant approximate single-source shortest paths for weighted graphs in the low memory MPC setting.
We consider the problem of evaluating certain types of functional aggregation queries on relational data subject to additive inequalities. Such aggregation queries, with a smallish number of additive inequalities, arise naturally/commonly in many app lications, particularly in learning applications. We give a relatively complete categorization of the computational complexity of such problems. We first show that the problem is NP-hard, even in the case of one additive inequality. Thus we turn to approximating the query. Our main result is an efficient algorithm for approximating, with arbitrarily small relative error, many natural aggregation queries with one additive inequality. We give examples of natural queries that can be efficiently solved using this algorithm. In contrast, we show that the situation with two additive inequalities is quite different, by showing that it is NP-hard to evaluate simple aggregation queries, with two additive inequalities, with any bounded relative error.
We consider the problem of computing a $(1+epsilon)$-approximation of the Hamming distance between a pattern of length $n$ and successive substrings of a stream. We first look at the one-way randomised communication complexity of this problem, giving Alice the first half of the stream and Bob the second half. We show the following: (1) If Alice and Bob both share the pattern then there is an $O(epsilon^{-4} log^2 n)$ bit randomised one-way communication protocol. (2) If only Alice has the pattern then there is an $O(epsilon^{-2}sqrt{n}log n)$ bit randomised one-way communication protocol. We then go on to develop small space streaming algorithms for $(1+epsilon)$-approximate Hamming distance which give worst case running time guarantees per arriving symbol. (1) For binary input alphabets there is an $O(epsilon^{-3} sqrt{n} log^{2} n)$ space and $O(epsilon^{-2} log{n})$ time streaming $(1+epsilon)$-approximate Hamming distance algorithm. (2) For general input alphabets there is an $O(epsilon^{-5} sqrt{n} log^{4} n)$ space and $O(epsilon^{-4} log^3 {n})$ time streaming $(1+epsilon)$-approximate Hamming distance algorithm.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا