External memory bisimulation reduction of big graphs

50 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yongming Luo

تاريخ النشر 2012

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yongming Luo - George H. L. Fletcher - Jan Hidders

قواعد البيانات بنى وهياكل البيانات والخوارزميات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we present, to our knowledge, the first known I/O efficient solutions for computing the k-bisimulation partition of a massive directed graph, and performing maintenance of such a partition upon updates to the underlying graph. Ubiquitous in the theory and application of graph data, bisimulation is a robust notion of node equivalence which intuitively groups together nodes in a graph which share fundamental structural features. k-bisimulation is the standard variant of bisimulation where the topological features of nodes are only considered within a local neighborhood of radius $kgeqslant 0$. The I/O cost of our partition construction algorithm is bounded by $O(kcdot mathit{sort}(|et|) + kcdot scan(| t|) + mathit{sort}(| t|))$, while our maintenance algorithms are bounded by $O(kcdot mathit{sort}(|et|) + kcdot mathit{sort}(| t|))$. The space complexity bounds are $O(| t|+|et|)$ and $O(kcdot| t|+kcdot|et|)$, resp. Here, $|et|$ and $| t|$ are the number of disk pages occupied by the input graphs edge set and node set, resp., and $mathit{sort}(n)$ and $mathit{scan}(n)$ are the cost of sorting and scanning, resp., a file occupying $n$ pages in external memory. Empirical analysis on a variety of massive real-world and synthetic graph datasets shows that our algorithms perform efficiently in practice, scaling gracefully as graphs grow in size.

قيم البحث

84 - Xinghao Pan , Dimitris Papailiopoulos , Samet Oymak 2015

Given a similarity graph between items, correlation clustering (CC) groups similar items together and dissimilar ones apart. One of the most popular CC algorithms is KwikCluster: an algorithm that serially clusters neighborhoods of vertices, and obta ins a 3-approximation ratio. Unfortunately, KwikCluster in practice requires a large number of clustering rounds, a potential bottleneck for large graphs. We present C4 and ClusterWild!, two algorithms for parallel correlation clustering that run in a polylogarithmic number of rounds and achieve nearly linear speedups, provably. C4 uses concurrency control to enforce serializability of a parallel clustering process, and guarantees a 3-approximation ratio. ClusterWild! is a coordination free algorithm that abandons consistency for the benefit of better scaling; this leads to a provably small loss in the 3-approximation ratio. We provide extensive experimental results for both algorithms, where we outperform the state of the art, both in terms of clustering accuracy and running time. We show that our algorithms can cluster billion-edge graphs in under 5 seconds on 32 cores, while achieving a 15x speedup.

النظم الموزعة والتوازية والحوسبة العنقودية بنى وهياكل البيانات والخوارزميات التعلم الالي

Aggregation by Provenance Types: A Technique for Summarising Provenance Graphs

155 - Luc Moreau 2015

As users become confronted with a deluge of provenance data, dedicated techniques are required to make sense of this kind of information. We present Aggregation by Provenance Types, a provenance graph analysis that is capable of generating provenance graph summaries. It proceeds by converting provenance paths up to some length k to attributes, referred to as provenance types, and by grouping nodes that have the same provenance types. The summary also includes numeric values representing the frequency of nodes and edges in the original graph. A quantitative evaluation and a complexity analysis show that this technique is tractable; with small values of k, it can produce useful summaries and can help detect outliers. We illustrate how the generated summaries can further be used for conformance checking and visualization.

قواعد البيانات بنى وهياكل البيانات والخوارزميات

An Efficient and Wear-Leveling-Aware Frequent-Pattern Mining on Non-Volatile Memory

135 - Jiaqi Dong , Runyu Zhang , Chaoshu Yang 2020

Frequent-pattern mining is a common approach to reveal the valuable hidden trends behind data. However, existing frequent-pattern mining algorithms are designed for DRAM, instead of persistent memories (PMs), which can lead to severe performance and energy overhead due to the utterly different characteristics between DRAM and PMs when they are running on PMs. In this paper, we propose an efficient and Wear-leveling-aware Frequent-Pattern Mining scheme, WFPM, to solve this problem. The proposed WFPM is evaluated by a series of experiments based on realistic datasets from diversified application scenarios, where WFPM achieves 32.0% performance improvement and prolongs the NVM lifetime of header table by 7.4x over the EvFP-Tree.

قواعد البيانات بنى وهياكل البيانات والخوارزميات

GraphMP: An Efficient Semi-External-Memory Big Graph Processing System on a Single Machine

83 - Peng Sun , Yonggang Wen , Ta Nguyen Binh Duong 2017

Recent studies showed that single-machine graph processing systems can be as highly competitive as cluster-based approaches on large-scale problems. While several out-of-core graph processing systems and computation models have been proposed, the hig h disk I/O overhead could significantly reduce performance in many practical cases. In this paper, we propose GraphMP to tackle big graph analytics on a single machine. GraphMP achieves low disk I/O overhead with three techniques. First, we design a vertex-centric sliding window (VSW) computation model to avoid reading and writing vertices on disk. Second, we propose a selective scheduling method to skip loading and processing unnecessary edge shards on disk. Third, we use a compressed edge cache mechanism to fully utilize the available memory of a machine to reduce the amount of disk accesses for edges. Extensive evaluations have shown that GraphMP could outperform state-of-the-art systems such as GraphChi, X-Stream and GridGraph by 31.6x, 54.5x and 23.1x respectively, when running popular graph applications on a billion-vertex graph.

النظم الموزعة والتوازية والحوسبة العنقودية

Iteration Algebras for UnQL Graphs and Completeness for Bisimulation

93 - Makoto Hamana 2015

This paper shows an application of Bloom and Esiks iteration algebras to model graph data in a graph database query language. About twenty years ago, Buneman et al. developed a graph database query language UnQL on the top of a functional meta-langua ge UnCAL for describing and manipulating graphs. Recently, the functional programming community has shown renewed interest in UnCAL, because it provides an efficient graph transformation language which is useful for various applications, such as bidirectional computation. However, no mathematical semantics of UnQL/UnCAL graphs has been developed. In this paper, we give an equational axiomatisation and algebraic semantics of UnCAL graphs. The main result of this paper is to prove that completeness of our equational axioms for UnCAL for the original bisimulation of UnCAL graphs via iteration algebras. Another benefit of algebraic semantics is a clean characterisation of structural recursion on graphs using free iteration algebra.

المنطق في علوم الحاسوب قواعد البيانات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة الافتراضية السورية

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

External memory bisimulation reduction of big graphs

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً