A Comparative Study on Exact Triangle Counting Algorithms on the GPU

205 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Leyuan Wang

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Leyuan Wang - Yangzihao Wang - Carl Yang

النظم الموزعة والتوازية والحوسبة العنقودية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We implement exact triangle counting in graphs on the GPU using three different methodologies: subgraph matching to a triangle pattern; programmable graph analytics, with a set-intersection approach; and a matrix formulation based on sparse matrix-matrix multiplies. All three deliver best-of-class performance over CPU implementations and over comparable GPU implementations, with the graph-analytic approach achieving the best performance due to its ability to exploit efficient filtering steps to remove unnecessary work and its high-performance set-intersection core.

قيم البحث

77 - Leyuan Wang , John D. Owens 2019

In this paper, we propose a novel method to compute triangle counting on GPUs. Unlike previous formulations of graph matching, our approach is BFS-based by traversing the graph in an all-source-BFS manner and thus can be mapped onto GPUs in a massive ly parallel fashion. Our implementation uses the Gunrock programming model and we evaluate our implementation in runtime and memory consumption compared with previous state-of-the-art work. We sustain a peak traversed-edges-per-second (TEPS) rate of nearly 10 GTEPS. Our algorithm is the most scalable and parallel among all existing GPU implementations and also outperforms all existing CPU distributed implementations. This work specifically focuses on leveraging our implementation on the triangle counting problem for the Subgraph Isomorphism Graph Challenge 2019, demonstrating a geometric mean speedup over the 2018 champion of 3.84x.

النظم الموزعة والتوازية والحوسبة العنقودية

A Comparative Study of 2D Numerical Methods with GPU Computing

88 - Ben J. Zimmerman , Jonathan D. Regele , Bong Wie 2017

Graphics Processing Unit (GPU) computing is becoming an alternate computing platform for numerical simulations. However, it is not clear which numerical scheme will provide the highest computational efficiency for different types of problems. To this end, numerical accuracies and computational work of several numerical methods are compared using a GPU computing implementation. The Correction Procedure via Reconstruction (CPR), Discontinuous Galerkin (DG), Nodal Discontinuous Galerkin (NDG), Spectral Difference (SD), and Finite Volume (FV) methods are investigated using various reconstruction orders. Both smooth and discontinuous cases are considered for two-dimensional simulations. For discontinuous problems, MUSCL schemes are employed with FV, while CPR, DG, NDG, and SD use slope limiting. The computation time to reach a set error criteria and total time to complete solutions are compared across the methods. It is shown that while FV methods can produce solutions with low computational times, they produce larger errors than high-order methods for smooth problems at the same order of accuracy. For discontinuous problems, the methods show good agreement with one another in terms of solution profiles, and the total computational times between FV, CPR, and SD are comparable.

النظم الموزعة والتوازية والحوسبة العنقودية التحليل العددي الفيزياء الحسابية

Large Cuts with Local Algorithms on Triangle-Free Graphs

275 - Juho Hirvonen , Joel Rybicki , Stefan Schmid 2014

We study the problem of finding large cuts in $d$-regular triangle-free graphs. In prior work, Shearer (1992) gives a randomised algorithm that finds a cut of expected size $(1/2 + 0.177/sqrt{d})m$, where $m$ is the number of edges. We give a simpler algorithm that does much better: it finds a cut of expected size $(1/2 + 0.28125/sqrt{d})m$. As a corollary, this shows that in any $d$-regular triangle-free graph there exists a cut of at least this size. Our algorithm can be interpreted as a very efficient randomised distributed algorithm: each node needs to produce only one random bit, and the algorithm runs in one synchronous communication round. This work is also a case study of applying computational techniques in the design of distributed algorithms: our algorithm was designed by a computer program that searched for optimal algorithms for small values of $d$.

النظم الموزعة والتوازية والحوسبة العنقودية الرياضيات المتقطعة بنى وهياكل البيانات والخوارزميات

TRUST: Triangle Counting Reloaded on GPUs

197 - Santosh Pandey , Zhibin Wang , Sheng Zhong 2021

Triangle counting is a building block for a wide range of graph applications. Traditional wisdom suggests that i) hashing is not suitable for triangle counting, ii) edge-centric triangle counting beats vertex-centric design, and iii) communication-fr ee and workload balanced graph partitioning is a grand challenge for triangle counting. On the contrary, we advocate that i) hashing can help the key operations for scalable triangle counting on Graphics Processing Units (GPUs), i.e., list intersection and graph partitioning, ii)vertex-centric design reduces both hash table construction cost and memory consumption, which is limited on GPUs. In addition, iii) we exploit graph and workload collaborative, and hashing-based 2D partitioning to scale vertex-centric triangle counting over 1,000 GPUswith sustained scalability. In this work, we present TRUST which performs triangle counting with the hash operation and vertex-centric mechanism at the core. To the best of our knowledge, TRUSTis the first work that achieves over one trillion Traversed Edges Per Second (TEPS) rate for triangle counting.

نظرية المعلومات النظم الموزعة والتوازية والحوسبة العنقودية نظرية المعلومات

GPU Algorithms for Efficient Exascale Discretizations

272 - Ahmad Abdelfattah , Valeria Barra , Natalie Beams 2021

In this paper we describe the research and development activities in the Center for Efficient Exascale Discretization within the US Exascale Computing Project, targeting state-of-the-art high-order finite-element algorithms for high-order application s on GPU-accelerated platforms. We discuss the GPU developments in several components of the CEED software stack, including the libCEED, MAGMA, MFEM, libParanumal, and Nek projects. We report performance and capability improvements in several CEED-enabled applications on both NVIDIA and AMD GPU systems.

النظم الموزعة والتوازية والحوسبة العنقودية البرمجيات الرياضية التحليل العددي

سجل دخول لتتمكن من نشر تعليقات