ترغب بنشر مسار تعليمي؟ اضغط هنا

Performance Characteristics of the BlueField-2 SmartNIC

375   0   0.0 ( 0 )
 نشر من قبل Jianshen Liu
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

High-performance computing (HPC) researchers have long envisioned scenarios where application workflows could be improved through the use of programmable processing elements embedded in the network fabric. Recently, vendors have introduced programmable Smart Network Interface Cards (SmartNICs) that enable computations to be offloaded to the edge of the network. There is great interest in both the HPC and high-performance data analytics communities in understanding the roles these devices may play in the data paths of upcoming systems. This paper focuses on characterizing both the networking and computing aspects of NVIDIAs new BlueField-2 SmartNIC when used in an Ethernet environment. For the networking evaluation we conducted multiple transfer experiments between processors located at the host, the SmartNIC, and a remote host. These tests illuminate how much processing headroom is available on the SmartNIC during transfers. For the computing evaluation we used the stress-ng benchmark to compare the BlueField-2 to other servers and place realistic bounds on the types of offload operations that are appropriate for the hardware. Our findings from this work indicate that while the BlueField-2 provides a flexible means of processing data at the networks edge, great care must be taken to not overwhelm the hardware. While the host can easily saturate the network link, the SmartNICs embedded processors may not have enough computing resources to sustain more than half the expected bandwidth when using kernel-space packet processing. From a computational perspective, encryption operations, memory operations under contention, and on-card IPC operations on the SmartNIC perform significantly better than the general-purpose servers used for comparisons in our experiments. Therefore, applications that mainly focus on these operations may be good candidates for offloading to the SmartNIC.



قيم البحث

اقرأ أيضاً

In September 2020, the Broadband Forum published a new industry standard for measuring network quality. The standard centers on the notion of quality attenuation. Quality attenuation is a measure of the distribution of latency and packet loss between two points connected by a network path. A vital feature of the quality attenuation idea is that we can express detailed application requirements and network performance measurements in the same mathematical framework. Performance requirements and measurements are both modeled as latency distributions. To the best of our knowledge, existing models of the 802.11 WiFi protocol do not permit the calculation of complete latency distributions without assuming steady-state operation. We present a novel model of the WiFi protocol. Instead of computing throughput numbers from a steady-state analysis of a Markov chain, we explicitly model latency and packet loss. Explicitly modeling latency and loss allows for both transient and steady-state analysis of latency distributions, and we can derive throughput numbers from the latency results. Our model is, therefore, more general than the standard Markov chain methods. We reproduce several known results with this method. Using transient analysis, we derive bounds on WiFi throughput under the requirement that latency and packet loss must be bounded.
Internet routing can often be sub-optimal, with the chosen routes providing worse performance than other available policy-compliant routes. This stems from the lack of visibility into route performance at the network layer. While this is an old probl em, we argue that recent advances in programmable hardware finally open up the possibility of performance-aware routing in a deployable, BGP-compatible manner. We introduce ROUTESCOUT, a hybrid hardware/software system supporting performance-based routing at ISP scale. In the data plane, ROUTESCOUT leverages P4-enabled hardware to monitor performance across policy-compliant route choices for each destination, at line-rate and with a small memory footprint. ROUTESCOUTs control plane then asynchronously pulls aggregated performance metrics to synthesize a performance-aware forwarding policy. We show that ROUTESCOUT can monitor performance across most of an ISPs traffic, using only 4 MB of memory. Further, its control can flexibly satisfy a variety of operator objectives, with sub-second operating times.
In this paper, we study the stability of light traffic achieved by a scheduling algorithm which is suitable for heterogeneous traffic networks. Since analyzing a scheduling algorithm is intractable using the conventional mathematical tool, our goal i s to minimize the largest queue-overflow probability achieved by the algorithm. In the large deviation setting, this problem is equivalent to maximizing the asymptotic decay rate of the largest queue-overflow probability. We first derive an upper bound on the decay rate of the queue overflow probability as the queue overflow threshold approaches infinity. Then, we study several structural properties of the minimum-cost-path to overflow of the queue with the largest length, which is basically equivalent to the decay rate of the largest queue-overflow probability. Given these properties, we prove that the queue with the largest length follows a sample path with linear increment. For certain parameter value, the scheduling algorithm is asymptotically optimal in reducing the largest queue length. Through numerical results, we have shown the large deviation properties of the queue length typically used in practice while varying one parameter of the algorithm.
Understanding network and application performance are essential for debugging, improving user experience, and performance comparison. Meanwhile, modern mobile systems are optimized for energy-efficient computation and communications that may limit th e performance of network and applications. In recent years, several tools have emerged that analyze network performance of mobile applications in~situ with the help of the VPN service. There is a limited understanding of how these measurement tools and system optimizations affect the network and application performance. In this study, we first demonstrate that mobile systems employ energy-aware system hardware tuning, which affects application performance and network throughput. We next show that the VPN-based application performance measurement tools, such as Lumen, PrivacyGuard, and Video Optimizer, aid in ambiguous network performance measurements and degrade the application performance. Our findings suggest that sound application and network performance measurement on Android devices requires a good understanding of the device, networks, measurement tools, and applications.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا