MODC: Resilience for disaggregated memory architectures using task-based programming

66 0 0.0 ( 0 )

Download Cite

Added by Kimberly Keeton

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Kimberly Keeton - Sharad Singhal - Haris Volos

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Disaggregated memory architectures provide benefits to applications beyond traditional scale out environments, such as independent scaling of compute and memory resources. They also provide an independent failure model, where computations or the compute nodes they run on may fail independently of the disaggregated memory; thus, data thats resident in the disaggregated memory is unaffected by the compute failure. Blind application of traditional techniques for resilience (e.g., checkpoints or data replication) does not take advantage of these architectures. To demonstrate the potential benefit of these architectures for resilience, we develop Memory-Oriented Distributed Computing (MODC), a framework for programming disaggregated architectures that borrows and adapts ideas from task-based programming models, concurrent programming techniques, and lock-free data structures. This framework includes a task-based application programming model and a runtime system that provides scheduling, coordination, and fault tolerance mechanisms. We present highlights of our MODC prototype and experimental results demonstrating that MODC-style resilience outperforms a checkpoint-based approach in the face of failures.

rate research

Disaggregated Memory at the Edge

58 - Luis M Vaquero , Yehia Elkhatib , Felix Cuadrado 2021

This paper describes how to augment techniques such as Distributed Shared Memory with recent trends on disaggregated Non Volatile Memory in the data centre so that the combination can be used in an edge environment with potentially volatile and mobile resources. This article identifies the main advantages and challenges, and offers an architectural evolution to incorporate recent research trends into production-ready disaggregated edges. We also present two prototypes showing the feasibility of this proposal.

Distributed Parallel and Cluster Computing

Memtrade: A Disaggregated-Memory Marketplace for Public Clouds

77 - Hasan Al Maruf , Yuhong Zhong , Hongyi Wang 2021

We present Memtrade, the first memory disaggregation system for public clouds. Public clouds introduce a set of unique challenges for resource disaggregation across different tenants, including security, isolation and pricing. Memtrade allows producer virtual machines (VMs) to lease both their unallocated memory and allocated-but-idle application memory to remote consumer VMs for a limited period of time. Memtrade does not require any modifications to host-level system software or support from the cloud provider. It harvests producer memory using an application-aware control loop to form a distributed transient remote memory pool with minimal performance impact; it employs a broker to match producers with consumers while satisfying performance constraints; and it exposes the matched memory to consumers as a secure KV cache. Our evaluation using real-world cluster traces shows that Memtrade provides significant performance benefit for consumers (improving average read latency up to 2.8x) while preserving confidentiality and integrity, with little impact on producer applications (degrading performance by less than 2.1%).

Distributed Parallel and Cluster Computing

MIND: In-Network Memory Management for Disaggregated Data Centers

63 - Seung-seob Lee , Yanpeng Yu , Yupeng Tang 2021

Memory-compute disaggregation promises transparent elasticity, high utilization and balanced usage for resources in data centers by physically separating memory and compute into network-attached resource blades. However, existing designs achieve performance at the cost of resource elasticity, restricting memory sharing to a single compute blade to avoid costly memory coherence traffic over the network. In this work, we show that emerging programmable network switches can enable an efficient shared memory abstraction for disaggregated architectures by placing memory management logic in the network fabric. We find that centralizing memory management in the network permits bandwidth and latency-efficient realization of in-network cache coherence protocols, while programmable switch ASICs support other memory management logic at line-rate. We realize these insights into MIND, an in-network memory management unit for rack-scale memory disaggregation. MIND enables transparent resource elasticity while matching the performance of prior memory disaggregation proposals for real-world workloads.

Distributed Parallel and Cluster Computing

Clio: A Hardware-Software Co-Designed Disaggregated Memory System

164 - Zhiyuan Guo , Yizhou Shan , Xuhao Luo 2021

Memory disaggregation has attracted great attention recently because of its benefits in efficient memory utilization and ease of management. So far, memory disaggregation research has all taken one of two approaches, building/emulating memory nodes with either regular servers or raw memory devices with no processing power. The former incurs higher monetary cost and face tail latency and scalability limitations, while the latter introduce performance, security, and management problems. Server-based memory nodes and memory nodes with no processing power are two extreme approaches. We seek a sweet spot in the middle by proposing a hardware-based memory disaggregation solution that has the right amount of processing power at memory nodes. Furthermore, we take a clean-slate approach by starting from the requirements of memory disaggregation and designing a memory-disaggregation-native system. We propose a hardware-based disaggregated memory system, Clio, that virtualizes and manages disaggregated memory at the memory node. Clio includes a new hardware-based virtual memory system, a customized network system, and a framework for computation offloading. In building Clio, we not only co-design OS functionalities, hardware architecture, and the network system, but also co-design the compute node and memory node. We prototyped Clios memory node with FPGA and implemented its client-node functionalities in a user-space library. Clio achieves 100 Gbps throughput and an end-to-end latency of 2.5 us at median and 3.2 us at the 99th percentile. Clio scales much better and has orders of magnitude lower tail latency than RDMA, and it has 1.1x to 3.4x energy saving compared to CPU-based and SmartNIC-based disaggregated memory systems and is 2.7x faster than software-based SmartNIC solutions.

Distributed Parallel and Cluster Computing

Building Atomic, Crash-Consistent Data Stores with Disaggregated Persistent Memory

69 - Shin-Yeh Tsai , Yiying Zhang 2019

Byte-addressable persistent memories (PM) has finally made their way into production. An important and pressing problem that follows is how to deploy them in existing datacenters. One viable approach is to attach PM as self-contained devices to the network as disaggregated persistent memory, or DPM. DPM requires no changes to existing servers in datacenters; without the need to include a processor, DPM devices are cheap to build; and by sharing DPM across compute servers, they offer great elasticity and efficient resource packing. This paper explores different ways to organize DPM and to build data stores with DPM. Specifically, we propose three architectures of DPM: 1) compute nodes directly access DPM (DPM-Direct); 2) compute nodes send requests to a coordinator server, which then accesses DPM to complete a request (DPM-Central); and 3) compute nodes directly access DPM for data operations and communicate with a global metadata server for the control plane (DPM-Sep). Based on these architectures, we built three atomic, crash-consistent data stores. We evaluated their performance, scalability, and CPU cost with micro-benchmarks and YCSB. Our evaluation results show that DPM-Direct has great small-size read but poor write performance; DPM-Central has the best write performance when the scale of the cluster is small but performs poorly when the scale increases; and DPM-Sep performs well overall.

Distributed Parallel and Cluster Computing Networking and Internet Architecture