System Evaluation of the Intel Optane Byte-addressable NVM

137 0 0.0 ( 0 )

Download Cite

Added by Ivy Peng

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Ivy B. Peng - Maya B. Gokhale - Eric W. Green

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Byte-addressable non-volatile memory (NVM) features high density, DRAM comparable performance, and persistence. These characteristics position NVM as a promising new tier in the memory hierarchy. Nevertheless, NVM has asymmetric read and write performance, and considerably higher write energy than DRAM. Our work provides an in-depth evaluation of the first commercially available byte-addressable NVM -- the Intel Optane DC persistent memory. The first part of our study quantifies the latency, bandwidth, power efficiency, and energy consumption under eight memory configurations. We also evaluate the real impact on in-memory graph processing workloads. Our results show that augmenting NVM with DRAM is essential, and the combination can effectively bridge the performance gap and provide reasonable performance with higher capacity. We also identify NUMA-related performance characteristics for accesses to memory on a remote socket. In the second part, we employ two fine-grained allocation policies to control traffic distribution between DRAM and NVM. Our results show that bandwidth spilling between DRAM and NVM could provide 2.0x bandwidth and enable $20%$ larger problems than using DRAM as a cache. Also, write isolation between DRAM and NVM could save up to 3.9x energy and improves bandwidth by 3.1x compared to DRAM-cached NVM. We establish a roofline model to explore power and energy efficiency at various distributions of read-only traffic. Our results show that NVM requires 1.8x lower power than DRAM for data-intensive workloads. Overall, applications can significantly optimize performance and power efficiency by adapting traffic distribution to NVM and DRAM through memory configurations and fine-grained policies to fully exploit the new memory device.

rate research

System measurement of Intel AEP Optane DIMM

70 - Tianyue Lu , Haiyang Pan , Mingyu Chen 2020

In recent years, memory wall has been a great performance bottleneck of computer system. To overcome it, Non-Volatile Main Memory (NVMM) technology has been discussed widely to provide a much larger main memory capacity. Last year, Intel released AEP Optane DIMM, which provides hundreds of GB capacity as a promising replacement of traditional DRAM memory. But as most key parameters of AEP is not open to users, there is a need to get to know them because they will guide a direction of further NVMM research. In this paper, we focus on measuring performance and architecture features of AEP DIMM. Together, we explore the design of DRAM cache which is an important part of DRAM-AEP hybrid memory system. As a result, we estimate the write latency of AEP DIMM which has not been measured accurately. And, we discover the current design parameters of DRAM cache, such as tag organization, cache associativity and set index mapping. All of these features are first published on academic paper which are greatly helpful to future NVMM optimizations.

Hardware Architecture

Single Machine Graph Analytics on Massive Datasets Using Intel Optane DC Persistent Memory

73 - Gurbinder Gill 2019

Intel Optane DC Persistent Memory (Optane PMM) is a new kind of byte-addressable memory with higher density and lower cost than DRAM. This enables the design of affordable systems that support up to 6TB of randomly accessible memory. In this paper, we present key runtime and algorithmic principles to consider when performing graph analytics on extreme-scale graphs on large-memory platforms of this sort. To demonstrate the importance of these principles, we evaluate four existing shared-memory graph frameworks on large real-world web-crawls, using a machine with 6TB of Optane PMM. Our results show that frameworks based on the runtime and algorithmic principles advocated in this paper (i) perform significantly better than the others, and (ii) are competitive with graph analytics frameworks running on large production clusters.

Distributed Parallel and Cluster Computing

Lessons learned from the early performance evaluation of Intel Optane DC Persistent Memory in DBMS

253 - Yinjun Wu , Kwanghyun Park , Rathijit Sen 2020

Non-volatile memory (NVM) is an emerging technology, which has the persistence characteristics of large capacity storage devices(e.g., HDDs and SSDs), while providing the low access latency and byte-addressablity of traditional DRAM memory. This unique combination of features open up several new design considerations when building database management systems (DBMSs), such as replacing DRAM (as the main working space memory) or block devices (as the persistent storage), or complementing both at the same time for several DBMS components (such as access methods,storage engine, buffer management, logging/recovery, etc). However, interacting with NVM requires changes to application software to best use the device (e.g. mmap and clflush of small cache lines instead of write and fsync of large page buffers). Before introducing (potentially major) code changes to the DBMS for NVM, developers need a clear understanding of NVM performance in various conditions to help make better design choices. In this paper, we provide extensive performance evaluations conducted with a recently released NVM device, Intel Optane DC Persistent Memory (PMem), under different configurations with several micro-benchmark tools. Further, we evaluate OLTP and OLAP database workloads (i.e., TPC-C and TPC-H) with Microsoft SQL Server 2019 when using the NVM device as an in-memory buffer pool or persistent storage. From the lessons learned we share some recommendations for future DBMS design with PMem, e.g.simple hardware or software changes are not enough for the best use of PMem in DBMSs.

Databases

Energy-efficiency evaluation of Intel KNL for HPC workloads

103 - E. Calore , A. Gabbana , S.F. Schifano 2018

Energy consumption is increasingly becoming a limiting factor to the design of faster large-scale parallel systems, and development of energy-efficient and energy-aware applications is today a relevant issue for HPC code-developer communities. In this work we focus on energy performance of the Knights Landing (KNL) Xeon Phi, the latest many-core architecture processor introduced by Intel into the HPC market. We take into account the 64-core Xeon Phi 7230, and analyze its energy performance using both the on-chip MCDRAM and the regular DDR4 system memory as main storage for the application data-domain. As a benchmark application we use a Lattice Boltzmann code heavily optimized for this architecture and implemented using different memory data layouts to store its lattice. We assessthen the energy consumption using different memory data-layouts, kind of memory (DDR4 or MCDRAM) and number of threads per core.

Distributed Parallel and Cluster Computing

QPACE 2 and Domain Decomposition on the Intel Xeon Phi

454 - Paul Arts , Jacques Bloch , Peter Georg 2015

We give an overview of QPACE 2, which is a custom-designed supercomputer based on Intel Xeon Phi processors, developed in a collaboration of Regensburg University and Eurotech. We give some general recommendations for how to write high-performance code for the Xeon Phi and then discuss our implementation of a domain-decomposition-based solver and present a number of benchmarks.

Distributed Parallel and Cluster Computing High Energy Physics - Lattice Computational Physics