System measurement of Intel AEP Optane DIMM

71 0 0.0 ( 0 )

Download Cite

Added by Tianyue Lu

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Tianyue Lu - Haiyang Pan - Mingyu Chen

Hardware Architecture

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In recent years, memory wall has been a great performance bottleneck of computer system. To overcome it, Non-Volatile Main Memory (NVMM) technology has been discussed widely to provide a much larger main memory capacity. Last year, Intel released AEP Optane DIMM, which provides hundreds of GB capacity as a promising replacement of traditional DRAM memory. But as most key parameters of AEP is not open to users, there is a need to get to know them because they will guide a direction of further NVMM research. In this paper, we focus on measuring performance and architecture features of AEP DIMM. Together, we explore the design of DRAM cache which is an important part of DRAM-AEP hybrid memory system. As a result, we estimate the write latency of AEP DIMM which has not been measured accurately. And, we discover the current design parameters of DRAM cache, such as tag organization, cache associativity and set index mapping. All of these features are first published on academic paper which are greatly helpful to future NVMM optimizations.

rate research

System Evaluation of the Intel Optane Byte-addressable NVM

136 - Ivy B. Peng , Maya B. Gokhale , Eric W. Green 2019

Byte-addressable non-volatile memory (NVM) features high density, DRAM comparable performance, and persistence. These characteristics position NVM as a promising new tier in the memory hierarchy. Nevertheless, NVM has asymmetric read and write performance, and considerably higher write energy than DRAM. Our work provides an in-depth evaluation of the first commercially available byte-addressable NVM -- the Intel Optane DC persistent memory. The first part of our study quantifies the latency, bandwidth, power efficiency, and energy consumption under eight memory configurations. We also evaluate the real impact on in-memory graph processing workloads. Our results show that augmenting NVM with DRAM is essential, and the combination can effectively bridge the performance gap and provide reasonable performance with higher capacity. We also identify NUMA-related performance characteristics for accesses to memory on a remote socket. In the second part, we employ two fine-grained allocation policies to control traffic distribution between DRAM and NVM. Our results show that bandwidth spilling between DRAM and NVM could provide 2.0x bandwidth and enable $20%$ larger problems than using DRAM as a cache. Also, write isolation between DRAM and NVM could save up to 3.9x energy and improves bandwidth by 3.1x compared to DRAM-cached NVM. We establish a roofline model to explore power and energy efficiency at various distributions of read-only traffic. Our results show that NVM requires 1.8x lower power than DRAM for data-intensive workloads. Overall, applications can significantly optimize performance and power efficiency by adapting traffic distribution to NVM and DRAM through memory configurations and fine-grained policies to fully exploit the new memory device.

Distributed Parallel and Cluster Computing

Single Machine Graph Analytics on Massive Datasets Using Intel Optane DC Persistent Memory

73 - Gurbinder Gill 2019

Intel Optane DC Persistent Memory (Optane PMM) is a new kind of byte-addressable memory with higher density and lower cost than DRAM. This enables the design of affordable systems that support up to 6TB of randomly accessible memory. In this paper, we present key runtime and algorithmic principles to consider when performing graph analytics on extreme-scale graphs on large-memory platforms of this sort. To demonstrate the importance of these principles, we evaluate four existing shared-memory graph frameworks on large real-world web-crawls, using a machine with 6TB of Optane PMM. Our results show that frameworks based on the runtime and algorithmic principles advocated in this paper (i) perform significantly better than the others, and (ii) are competitive with graph analytics frameworks running on large production clusters.

Distributed Parallel and Cluster Computing

Lessons learned from the early performance evaluation of Intel Optane DC Persistent Memory in DBMS

253 - Yinjun Wu , Kwanghyun Park , Rathijit Sen 2020

Non-volatile memory (NVM) is an emerging technology, which has the persistence characteristics of large capacity storage devices(e.g., HDDs and SSDs), while providing the low access latency and byte-addressablity of traditional DRAM memory. This unique combination of features open up several new design considerations when building database management systems (DBMSs), such as replacing DRAM (as the main working space memory) or block devices (as the persistent storage), or complementing both at the same time for several DBMS components (such as access methods,storage engine, buffer management, logging/recovery, etc). However, interacting with NVM requires changes to application software to best use the device (e.g. mmap and clflush of small cache lines instead of write and fsync of large page buffers). Before introducing (potentially major) code changes to the DBMS for NVM, developers need a clear understanding of NVM performance in various conditions to help make better design choices. In this paper, we provide extensive performance evaluations conducted with a recently released NVM device, Intel Optane DC Persistent Memory (PMem), under different configurations with several micro-benchmark tools. Further, we evaluate OLTP and OLAP database workloads (i.e., TPC-C and TPC-H) with Microsoft SQL Server 2019 when using the NVM device as an in-memory buffer pool or persistent storage. From the lessons learned we share some recommendations for future DBMS design with PMem, e.g.simple hardware or software changes are not enough for the best use of PMem in DBMSs.

Databases

System-level optimization of Network-on-Chips for heterogeneous 3D System-on-Chips

237 - Jan Moritz Joseph , Dominik Ermel , Lennart Bamberg 2019

For a system-level design of Networks-on-Chip for 3D heterogeneous System-on-Chip (SoC), the locations of components, routers and vertical links are determined from an application model and technology parameters. In conventional methods, the two inputs are accounted for separately; here, we define an integrated problem that considers both application model and technology parameters. We show that this problem does not allow for exact solution in reasonable time, as common for many design problems. Therefore, we contribute a heuristic by proposing design steps, which are based on separation of intralayer and interlayer communication. The advantage is that this new problem can be solved with well-known methods. We use 3D Vision SoC case studies to quantify the advantages and the practical usability of the proposed optimization approach. We achieve up to 18.8% reduced white space and up to 12.4% better network performance in comparison to conventional approaches.

Hardware Architecture

SNS Timing System