Do you want to publish a course? Click here

Revisiting Network Support for RDMA

380   0   0.0 ( 0 )
 Added by Radhika Mittal
 Publication date 2018
and research's language is English




Ask ChatGPT about the research

The advent of RoCE (RDMA over Converged Ethernet) has led to a significant increase in the use of RDMA in datacenter networks. To achieve good performance, RoCE requires a lossless network which is in turn achieved by enabling Priority Flow Control (PFC) within the network. However, PFC brings with it a host of problems such as head-of-the-line blocking, congestion spreading, and occasional deadlocks. Rather than seek to fix these issues, we instead ask: is PFC fundamentally required to support RDMA over Ethernet? We show that the need for PFC is an artifact of current RoCE NIC designs rather than a fundamental requirement. We propose an improved RoCE NIC (IRN) design that makes a few simple changes to the RoCE NIC for better handling of packet losses. We show that IRN (without PFC) outperforms RoCE (with PFC) by 6-83% for typical network scenarios. Thus not only does IRN eliminate the need for PFC, it improves performance in the process! We further show that the changes that IRN introduces can be implemented with modest overheads of about 3-10% to NIC resources. Based on our results, we argue that research and industry should rethink the current trajectory of network support for RDMA.



rate research

Read More

152 - Shuo Liu 2020
We present NetReduce, a novel RDMA-compatible in-network reduction architecture to accelerate distributed DNN training. Compared to existing designs, NetReduce maintains a reliable connection between end-hosts in the Ethernet and does not terminate the connection in the network. The advantage of doing so is that we can fully reuse the designs of congestion control and reliability in RoCE. In the meanwhile, we do not need to implement a high-cost network protocol processing stack in the switch, as IB does. The prototype implemented by using FPGA is an out-of-box solution without modifying commodity devices such as NICs or switches. For the coordination between the end-host and the switch, NetReduce customizes the transport protocol only on the first packet in a data message to comply with RoCE v2. The special status monitoring module is designed to reuse the reliability mechanism of RoCE v2 for dealing with packet loss. A message-level credit-based flow control algorithm is also proposed to fully utilize bandwidth and avoid buffer overflow. We study the effects of intra bandwidth on the training performance in multi-machines multi-GPUs scenario and give sufficient conditions for hierarchical NetReduce to outperform other algorithms. We also extend the design from rack-level aggregation to more general spine-leaf topology in the data center. NetReduce accelerates the training up to 1.7x and 1.5x for CNN-based CV and transformer-based NLP tasks, respectively. Simulations on large-scale systems indicate the superior scalability of NetReduce to the state-of-the-art ring all-reduce.
In this paper we propose Virtuoso, a purely software-based multi-path RDMA solution for data center networks (DCNs) to effectively utilize the rich multi-path topology for load balancing and reliability. As a middleware library operating at the user space, Virtuoso employs three innovative mechanisms to achieve its goal. In contrast to existing hardware-based MP-RDMA solution, Virtuoso can be readily deployed in DCNs with existing RDMA NICs. It also decouples path selection and load balancing mechanisms from hardware features, allowing DCN operators and applications to make flexible decisions by employing the best mechanisms (as plug-in software library modules) as needed. Our experiments show that Virtuoso is capable of fully utilizing multiple paths with negligible CPU overheads
In this paper, we argue that existing concepts for the design and implementation of network stacks for constrained devices do not comply with the requirements of current and upcoming Internet of Things (IoT) use cases. The IoT requires not only a lightweight but also a modular network stack, based on standards. We discuss functional and non-functional requirements for the software architecture of the network stack on constrained IoT devices. Then, revisiting concepts from the early Internet as well as current implementations, we propose a future-proof alternative to existing IoT network stack architectures, and provide an initial evaluation of this proposal based on its implementation running on top of state-of-the-art IoT operating system and hardware.
Wireless sensor networks (WSNs) can be a valuable decision-support tool for farmers. This motivated our deployment of a WSN system to support rain-fed agriculture in India. We defined promising use cases and resolved technical challenges throughout a two-year deployment of our COMMON-Sense Net system, which provided farmers with environment data. However, the direct use of this technology in the field did not foster the expected participation of the population. This made it difficult to develop the intended decision-support system. Based on this experience, we take the following position in this paper: currently, the deployment of WSN technology in developing regions is more likely to be effective if it targets scientists and technical personnel as users, rather than the farmers themselves. We base this claim on the lessons learned from the COMMON-Sense system deployment and the results of an extensive user experiment with agriculture scientists, which we describe in this paper.
Fast handover for Proxy Mobile IPv6 (FPMIPv6) can reduce handover delay and packet loss compared with Proxy Mobile IPv6 (PMIPv6). However, FPMIPv6 still cannot handle heterogeneous handovers due to the lack of unified Layer 2 triggering mechanism along with the booming of emerging wireless technologies. Media Independent Handover (MIH) can provide heterogeneous handover support, and a lot of integration solutions have been proposed for it. However, most of them focus on the integration of MIH and PMIPv6, and require the additional mechanisms, which are out of the scope the MIH and difficult to standardize the operations. Therefore, in this paper, we propose an integration solution of FPMIPv6 and MIH by extending the existing MIH standards, and adopt the city section mobility model to analyze its performance under different scenarios. The analytical results show that the proposed solution is capable of reducing the handover delay and the signaling cost compared with the standard as well as the fast handover solutions.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا