No Arabic abstract
High performance rack-scale offerings package disaggregated pools of compute, memory and storage hardware in a single rack to run diverse workloads with varying requirements, including applications that need low and predictable latency. The intra-rack network is typically high speed Ethernet, which can suffer from congestion leading to packet drops and may not satisfy the stringent tail latency requirements for some workloads (including remote memory/storage accesses). In this paper, we design a Predictable Low Latency(PL2) network architecture for rack-scale systems with Ethernet as interconnecting fabric. PL2 leverages programmable Ethernet switches to carefully schedule packets such that they incur no loss with NIC and switch queues maintained at small, near-zero levels. In our 100 Gbps rack-prototype, PL2 keeps 99th-percentile memcached RPC latencies under 60us even when the RPCs compete with extreme offered-loads of 400%, without losing traffic. Network transfers for a machine learning training task complete 30% faster than a receiver-driven scheme implementation modeled after Homa (222ms vs 321ms 99%ile latency per iteration).
Ultra Reliable Low Latency Communications (URLLC) is an important challenge for the next generation wireless networks, which poses very strict requirements to the delay and packet loss ratio. Satisfaction is hardly possible without introducing additional functionality to the existing communication technologies. In the paper, we propose and study an approach to enable URLLC in Wi-Fi networks by exploiting an additional radio similar to that of IEEE 802.11ba. With extensive simulation, we show that our approach allows decreasing the delay by orders of magnitude, while the throughput of non-URLLC devices is reduced insignificantly.
With the emergence of Internet-of-Things (IoT) and ever-increasing demand for the newly connected devices, there is a need for more effective storage and processing paradigms to cope with the data generated from these devices. In this study, we have discussed different paradigms for data processing and storage including Cloud, Fog, and Edge computing models and their suitability in integrating with the IoT. Moreover, a detailed discussion on low latency and massive connectivity requirements of future cellular networks in accordance with machine-type communication (MTC) is also presented. Furthermore, the need to bring IoT devices to Internet connectivity and a standardized protocol stack to regulate the data transmission between these devices is also addressed while keeping in view the resource constraint nature of IoT devices.
Many network applications, e.g., industrial control, demand Ultra-Low Latency (ULL). However, traditional packet networks can only reduce the end-to-end latencies to the order of tens of milliseconds. The IEEE 802.1 Time Sensitive Networking (TSN) standard and related research studies have sought to provide link layer support for ULL networking, while the emerging IETF Deterministic Networking (DetNet) standards seek to provide the complementary network layer ULL support. This article provides an up-to-date comprehensive survey of the IEEE TSN and IETF DetNet standards and the related research studies. The survey of these standards and research studies is organized according to the main categories of flow concept, flow synchronization, flow management, flow control, and flow integrity. ULL networking mechanisms play a critical role in the emerging fifth generation (5G) network access chain from wireless devices via access, backhaul, and core networks. We survey the studies that specifically target the support of ULL in 5G networks, with the main categories of fronthaul, backhaul, and network management. Throughout, we identify the pitfalls and limitations of the existing standards and research studies. This survey can thus serve as a basis for the development of standards enhancements and future ULL research studies that address the identified pitfalls and limitations.
We propose that clusters interconnected with network topologies having minimal mean path length will increase their overall performance for a variety of applications. We approach our heuristic by constructing clusters of up to 36 nodes having Dragonfly, torus, ring, Chvatal, Wagner, Bidiakis and several other topologies with minimal mean path lengths and by simulating the performance of 256-node clusters with the same network topologies. The optimal (or sub-optimal) low-latency network topologies are found by minimizing the mean path length of regular graphs. The selected topologies are benchmarked using ping-pong messaging, the MPI collective communications, and the standard parallel applications including effective bandwidth, FFTE, Graph 500 and NAS parallel benchmarks. We established strong correlations between the clusters performances and the network topologies, especially the mean path lengths, for a wide range of applications. In communication-intensive benchmarks, clusters with optimal network topologies out-perform those with mainstream topologies by several folds. It is striking that a mere adjustment of the network topology suffices to reclaim performance from the same computing hardware.
A broadcast mode may augment peer-to-peer overlay networks with an efficient, scalable data replication function, but may also give rise to a virtual link layer in VPN-type solutions. We introduce a simple broadcasting mechanism that operates in the prefix space of distributed hash tables without signaling. This paper concentrates on the performance analysis of the prefix flooding scheme. Starting from simple models of recursive $k$-ary trees, we analytically derive distributions of hop counts and the replication load. Extensive simulation results are presented further on, based on an implementation within the OverSim framework. Comparisons are drawn to Scribe, taken as a general reference model for group communication according to the shared, rendezvous-point-centered distribution paradigm. The prefix flooding scheme thereby confirmed its widely predictable performance and consistently outperformed Scribe in all metrics. Reverse path selection in overlays is identified as a major cause of performance degradation.