Single Machine Graph Analytics on Massive Datasets Using Intel Optane DC Persistent Memory

74 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Gurbinder Gill

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Gurbinder Gill

النظم الموزعة والتوازية والحوسبة العنقودية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Intel Optane DC Persistent Memory (Optane PMM) is a new kind of byte-addressable memory with higher density and lower cost than DRAM. This enables the design of affordable systems that support up to 6TB of randomly accessible memory. In this paper, we present key runtime and algorithmic principles to consider when performing graph analytics on extreme-scale graphs on large-memory platforms of this sort. To demonstrate the importance of these principles, we evaluate four existing shared-memory graph frameworks on large real-world web-crawls, using a machine with 6TB of Optane PMM. Our results show that frameworks based on the runtime and algorithmic principles advocated in this paper (i) perform significantly better than the others, and (ii) are competitive with graph analytics frameworks running on large production clusters.

قيم البحث

253 - Yinjun Wu , Kwanghyun Park , Rathijit Sen 2020

Non-volatile memory (NVM) is an emerging technology, which has the persistence characteristics of large capacity storage devices(e.g., HDDs and SSDs), while providing the low access latency and byte-addressablity of traditional DRAM memory. This uniq ue combination of features open up several new design considerations when building database management systems (DBMSs), such as replacing DRAM (as the main working space memory) or block devices (as the persistent storage), or complementing both at the same time for several DBMS components (such as access methods,storage engine, buffer management, logging/recovery, etc). However, interacting with NVM requires changes to application software to best use the device (e.g. mmap and clflush of small cache lines instead of write and fsync of large page buffers). Before introducing (potentially major) code changes to the DBMS for NVM, developers need a clear understanding of NVM performance in various conditions to help make better design choices. In this paper, we provide extensive performance evaluations conducted with a recently released NVM device, Intel Optane DC Persistent Memory (PMem), under different configurations with several micro-benchmark tools. Further, we evaluate OLTP and OLAP database workloads (i.e., TPC-C and TPC-H) with Microsoft SQL Server 2019 when using the NVM device as an in-memory buffer pool or persistent storage. From the lessons learned we share some recommendations for future DBMS design with PMem, e.g.simple hardware or software changes are not enough for the best use of PMem in DBMSs.

قواعد البيانات

GraphMP: I/O-Efficient Big Graph Analytics on a Single Commodity Machine

65 - Peng Sun , Yonggang Wen , Ta Nguyen Binh Duong 2018

Recent studies showed that single-machine graph processing systems can be as highly competitive as cluster-based approaches on large-scale problems. While several out-of-core graph processing systems and computation models have been proposed, the hig h disk I/O overhead could significantly reduce performance in many practical cases. In this paper, we propose GraphMP to tackle big graph analytics on a single machine. GraphMP achieves low disk I/O overhead with three techniques. First, we design a vertex-centric sliding window (VSW) computation model to avoid reading and writing vertices on disk. Second, we propose a selective scheduling method to skip loading and processing unnecessary edge shards on disk. Third, we use a compressed edge cache mechanism to fully utilize the available memory of a machine to reduce the amount of disk accesses for edges. Extensive evaluations have shown that GraphMP could outperform existing single-machine out-of-core systems such as GraphChi, X-Stream and GridGraph by up to 51, and can be as highly competitive as distributed graph engines like Pregel+, PowerGraph and Chaos.

النظم الموزعة والتوازية والحوسبة العنقودية

System Evaluation of the Intel Optane Byte-addressable NVM

136 - Ivy B. Peng , Maya B. Gokhale , Eric W. Green 2019

Byte-addressable non-volatile memory (NVM) features high density, DRAM comparable performance, and persistence. These characteristics position NVM as a promising new tier in the memory hierarchy. Nevertheless, NVM has asymmetric read and write perfor mance, and considerably higher write energy than DRAM. Our work provides an in-depth evaluation of the first commercially available byte-addressable NVM -- the Intel Optane DC persistent memory. The first part of our study quantifies the latency, bandwidth, power efficiency, and energy consumption under eight memory configurations. We also evaluate the real impact on in-memory graph processing workloads. Our results show that augmenting NVM with DRAM is essential, and the combination can effectively bridge the performance gap and provide reasonable performance with higher capacity. We also identify NUMA-related performance characteristics for accesses to memory on a remote socket. In the second part, we employ two fine-grained allocation policies to control traffic distribution between DRAM and NVM. Our results show that bandwidth spilling between DRAM and NVM could provide 2.0x bandwidth and enable $20%$ larger problems than using DRAM as a cache. Also, write isolation between DRAM and NVM could save up to 3.9x energy and improves bandwidth by 3.1x compared to DRAM-cached NVM. We establish a roofline model to explore power and energy efficiency at various distributions of read-only traffic. Our results show that NVM requires 1.8x lower power than DRAM for data-intensive workloads. Overall, applications can significantly optimize performance and power efficiency by adapting traffic distribution to NVM and DRAM through memory configurations and fine-grained policies to fully exploit the new memory device.

النظم الموزعة والتوازية والحوسبة العنقودية

Metall: A Persistent Memory Allocator For Data-Centric Analytics

190 - Keita Iwabuchi 2021

Data analytics applications transform raw input data into analytics-specific data structures before performing analytics. Unfortunately, such data ingestion step is often more expensive than analytics. In addition, various types of NVRAM devices are already used in many HPC systems today. Such devices will be useful for storing and reusing data structures beyond a single process life cycle. We developed Metall, a persistent memory allocator built on top of the memory-mapped file mechanism. Metall enables applications to transparently allocate custom C++ data structures into various types of persistent memories. Metall incorporates a concise and high-performance memory management algorithm inspired by Supermalloc and the rich C++ interface developed by Boost.Interprocess library. On a dynamic graph construction workload, Metall achieved up to 11.7x and 48.3x performance improvements over Boost.Interprocess and memkind (PMEM kind), respectively. We also demonstrate Metalls high adaptability by integrating Metall into a graph processing framework, GraphBLAS Template Library. This studys outcomes indicate that Metall will be a strong tool for accelerating future large-scale data analytics by allowing applications to leverage persistent memory efficiently.

النظم الموزعة والتوازية والحوسبة العنقودية

GraphMP: An Efficient Semi-External-Memory Big Graph Processing System on a Single Machine

83 - Peng Sun , Yonggang Wen , Ta Nguyen Binh Duong 2017

Recent studies showed that single-machine graph processing systems can be as highly competitive as cluster-based approaches on large-scale problems. While several out-of-core graph processing systems and computation models have been proposed, the hig h disk I/O overhead could significantly reduce performance in many practical cases. In this paper, we propose GraphMP to tackle big graph analytics on a single machine. GraphMP achieves low disk I/O overhead with three techniques. First, we design a vertex-centric sliding window (VSW) computation model to avoid reading and writing vertices on disk. Second, we propose a selective scheduling method to skip loading and processing unnecessary edge shards on disk. Third, we use a compressed edge cache mechanism to fully utilize the available memory of a machine to reduce the amount of disk accesses for edges. Extensive evaluations have shown that GraphMP could outperform state-of-the-art systems such as GraphChi, X-Stream and GridGraph by 31.6x, 54.5x and 23.1x respectively, when running popular graph applications on a billion-vertex graph.

النظم الموزعة والتوازية والحوسبة العنقودية

سجل دخول لتتمكن من نشر تعليقات