Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

TRUST: Triangle Counting Reloaded on GPUs

198 0 0.0 ( 0 )

Download Cite

Added by Santosh Pandey

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Santosh Pandey - Zhibin Wang - Sheng Zhong

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Triangle counting is a building block for a wide range of graph applications. Traditional wisdom suggests that i) hashing is not suitable for triangle counting, ii) edge-centric triangle counting beats vertex-centric design, and iii) communication-free and workload balanced graph partitioning is a grand challenge for triangle counting. On the contrary, we advocate that i) hashing can help the key operations for scalable triangle counting on Graphics Processing Units (GPUs), i.e., list intersection and graph partitioning, ii)vertex-centric design reduces both hash table construction cost and memory consumption, which is limited on GPUs. In addition, iii) we exploit graph and workload collaborative, and hashing-based 2D partitioning to scale vertex-centric triangle counting over 1,000 GPUswith sustained scalability. In this work, we present TRUST which performs triangle counting with the hash operation and vertex-centric mechanism at the core. To the best of our knowledge, TRUSTis the first work that achieves over one trillion Traversed Edges Per Second (TEPS) rate for triangle counting.

rate research

Fast BFS-Based Triangle Counting on GPUs

77 - Leyuan Wang , John D. Owens 2019

In this paper, we propose a novel method to compute triangle counting on GPUs. Unlike previous formulations of graph matching, our approach is BFS-based by traversing the graph in an all-source-BFS manner and thus can be mapped onto GPUs in a massively parallel fashion. Our implementation uses the Gunrock programming model and we evaluate our implementation in runtime and memory consumption compared with previous state-of-the-art work. We sustain a peak traversed-edges-per-second (TEPS) rate of nearly 10 GTEPS. Our algorithm is the most scalable and parallel among all existing GPU implementations and also outperforms all existing CPU distributed implementations. This work specifically focuses on leveraging our implementation on the triangle counting problem for the Subgraph Isomorphism Graph Challenge 2019, demonstrating a geometric mean speedup over the 2018 champion of 3.84x.

Distributed Parallel and Cluster Computing

K-Clique Counting on GPUs

309 - Mohammad Almasri , Izzat El Hajj , Rakesh Nagi 2021

Counting k-cliques in a graph is an important problem in graph analysis with many applications. Counting k-cliques is typically done by traversing search trees starting at each vertex in the graph. An important optimization is to eliminate search tree branches that discover the same clique redundantly. Eliminating redundant clique discovery is typically done via graph orientation or pivoting. Parallel implementations for both of these approaches have demonstrated promising performance on CPUs. In this paper, we present our GPU implementations of k-clique counting for both the graph orientation and pivoting approaches. Our implementations explore both vertex-centric and edge-centric parallelization schemes, and replace recursive search tree traversal with iterative traversal based on an explicitly-managed shared stack. We also apply various optimizations to reduce memory consumption and improve the utilization of parallel execution resources. Our evaluation shows that our best GPU implementation outperforms the best state-of-the-art parallel CPU implementation by a geometric mean speedup of 12.39x, 6.21x, and 18.99x for k = 4, 7, and 10, respectively. We also evaluate the impact of the choice of parallelization scheme and the incremental speedup of each optimization. Our code will be open-sourced to enable further research on parallelizing k-clique counting on GPUs.

Distributed Parallel and Cluster Computing Data Structures and Algorithms

N-Body Simulations on GPUs

357 - Erich Elsen , V. Vishal , Mike Houston 2007

Commercial graphics processors (GPUs) have high compute capacity at very low cost, which makes them attractive for general purpose scientific computing. In this paper we show how graphics processors can be used for N-body simulations to obtain improvements in performance over current generation CPUs. We have developed a highly optimized algorithm for performing the O(N^2) force calculations that constitute the major part of stellar and molecular dynamics simulations. In some of the calculations, we achieve sustained performance of nearly 100 GFlops on an ATI X1900XTX. The performance on GPUs is comparable to specialized processors such as GRAPE-6A and MDGRAPE-3, but at a fraction of the cost. Furthermore, the wide availability of GPUs has significant implications for cluster computing and distributed computing efforts like Folding@Home.

Computational Engineering Distributed Parallel and Cluster Computing

Accelerating QDP++/Chroma on GPUs

504 - Frank Winter 2011

Extensions to the C++ implementation of the QCD Data Parallel Interface are provided enabling acceleration of expression evaluation on NVIDIA GPUs. Single expressions are off-loaded to the device memory and execution domain leveraging the Portable Expression Template Engine and using Just-in-Time compilation techniques. Memory management is automated by a software implementation of a cache controlling the GPUs memory. Interoperability with existing Krylov space solvers is demonstrated and special attention is paid on Chroma readiness. Non-kernel routines in lattice QCD calculations typically not subject of hand-tuned optimisations are accelerated which can reduce the effects otherwise suffered from Amdahls Law.

High Energy Physics - Lattice Distributed Parallel and Cluster Computing

Counting generalized Reed-Solomon codes

109 - Peter Beelen , David Glynn , Tom H{o}holdt 2016

In this article we count the number of generalized Reed-Solomon (GRS) codes of dimension k and length n, including the codes coming from a non-degenerate conic plus nucleus. We compare our results with known formulae for the number of 3-dimensional MDS codes of length n=6,7,8,9.

Information Theory Discrete Mathematics Information Theory

comments

Fetching comments

The Islamic University of Lebanon

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

TRUST: Triangle Counting Reloaded on GPUs

Ask ChatGPT about the research

No Arabic abstract

Read More