ترغب بنشر مسار تعليمي؟ اضغط هنا

Faster than Flash: An In-Depth Study of System Challenges for Emerging Ultra-Low Latency SSDs

66   0   0.0 ( 0 )
 نشر من قبل Myoungsoo Jung
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Emerging storage systems with new flash exhibit ultra-low latency (ULL) that can address performance disparities between DRAM and conventional solid state drives (SSDs) in the memory hierarchy. Considering the advanced low-latency characteristics, different types of I/O completion methods (polling/hybrid) and storage stack architecture (SPDK) are proposed. While these new techniques are expected to take costly software interventions off the critical path in ULL-applied systems, unfortunately no study exists to quantitatively analyze system-level characteristics and challenges of combining such newly-introduced techniques with real ULL SSDs. In this work, we comprehensively perform empirical evaluations with 800GB ULL SSD prototypes and characterize ULL behaviors by considering a wide range of I/O path parameters, such as different queues and access patterns. We then analyze the efficiencies and challenges of the polled-mode and hybrid polling I/O completion methods (added into Linux kernels 4.4 and 4.10, respectively) and compare them with the efficiencies of a conventional interrupt-based I/O path. In addition, we revisit the common expectations of SPDK by examining all the system resources and parameters. Finally, we demonstrate the challenges of ULL SSDs in a real SPDK-enabled server-client system. Based on the performance behaviors that this study uncovers, we also discuss several system implications, which are required to take a full advantage of ULL SSD in the future.



قيم البحث

اقرأ أيضاً

For modern flash-based SSDs, the performance overhead of internal data migrations is dominated by the data transfer time, not by the flash program time as in old SSDs. In order to mitigate the performance impact of data migrations, we propose rCopyba ck, a restricted version of copyback. Rcopyback works like the original copyback except that only n consecutive copybacks are allowed. By limiting the number of successive copybacks, it guarantees that no data reliability problem occurs when data is internally migrated using rCopyback. In order to take a full advantage of rCopyback, we developed a rCopyback-aware FTL, rcFTL, which intelligently decides whether rCopyback should be used or not by exploiting varying host workloads. Our evaluation results show that rcFTL can improve the overall I/O throughput by 54% on average over an existing FTL which does not use copybacks.
359 - Ahmed Ibrahim , Ebrahim Bedeer , 2021
Faster-than-Nyquist (FTN) signaling is a promising non-orthogonal pulse modulation technique that can improve the spectral efficiency (SE) of next generation communication systems at the expense of higher detection complexity to remove the introduced inter-symbol interference (ISI). In this paper, we investigate the detection problem of ultra high-order quadrature-amplitude modulation (QAM) FTN signaling where we exploit a mathematical programming technique based on the alternating directions multiplier method (ADMM). The proposed ADMM sequence estimation (ADMMSE) FTN signaling detector demonstrates an excellent trade-off between performance and computational effort enabling, for the first time in the FTN signaling literature, successful detection and SE gains for QAM modulation orders as high as 64K (65,536). The complexity of the proposed ADMMSE detector is polynomial in the length of the transmit symbols sequence and its sensitivity to the modulation order increases only logarithmically. Simulation results show that for 16-QAM, the proposed ADMMSE FTN signaling detector achieves comparable SE gains to the generalized approach semidefinite relaxation-based sequence estimation (GASDRSE) FTN signaling detector, but at an experimentally evaluated much lower computational time. Simulation results additionally show SE gains for modulation orders starting from 4-QAM, or quadrature phase shift keying (QPSK), up to and including 64K-QAM when compared to conventional Nyquist signaling. The very low computational effort required makes the proposed ADMMSE detector a practically promising FTN signaling detector for both low order and ultra high-order QAM FTN signaling systems.
Modern mobile systems use a single input-to-display path to serve all applications. In meeting the visual goals of all applications, the path has a latency inadequate for many important interactions. To accommodate the different latency requirements and visual constraints by different interactions, we present POLYPATH, a system design in which application developers (and users) can choose from multiple path designs for their application at any time. Because a POLYPATH system asks for two or more path designs, we present a novel fast path design, called Presto. Presto reduces latency by judiciously allowing frame drops and tearing. We report an Android 5-based prototype of POLYPATH with two path designs: Android legacy and Presto. Using this prototype, we quantify the effectiveness, overhead, and user experience of POLYPATH, especially Presto, through both objective measurements and subjective user assessment. We show that Presto reduces the latency of legacy touchscreen drawing applications by almost half; and more importantly, this reduction is orthogonal to that of other popular approaches and is achieved without any user-noticeable negative visual effect. When combined with touch prediction, Presto is able to reduce the touch latency below 10 ms, a remarkable achievement without any hardware support.
148 - M. Seminara , T. Nawaz , S. Caputo 2020
This paper reports a detailed experimental characterization of optical performances of Visible Light Communication (VLC) system using a real traffic light for ultra-low latency, infrastructure-to-vehicle (I2V) communications for intelligent transport ation systems (ITS) protocols. Despite the implementation of long sought ITS protocols poses the crucial need to detail how the features of optical stages influence the overall performances of a VLC system in realistic configurations, such characterization has rarely been addressed at present. We carried out an experimental investigation in a realistic configuration where a regular traffic light (TX), enabled for VLC transmission, sends digital information towards a receiving stage (RX), composed by an optical condenser and a dedicated amplified photodiode stage. We performed a detailed measurements campaign of VLC performances encompassing a broad set of optical condensers, and for TX-RX distances in the range 3 - 50 m, in terms of both effective field of view (EFOV) and packet error rate (PER). The results show several nontrivial behaviors for different lens sets as a function of position on the measurement grid, highlighting critical aspects as well as identifying most suitable optical configurations depending on the specific application and on the required EFOV. In this paper we also provide a theoretical model for both the signal intensity and the EFOV as a function of several parameters, such as distance, RX orientation and focal length of the specific condenser. Our results could be very relevant in the near future to assess a most suited solution in terms of acceptance angle when designing a VLC system for real applications, where angle-dependent misalignment effects play a non-negligible role, and we argue that it could have more general implications with respect to the pristine I2V case mentioned here.
The deep neural network (DNN) based AI applications on the edge require both low-cost computing platforms and high-quality services. However, the limited memory, computing resources, and power budget of the edge devices constrain the effectiveness of the DNN algorithms. Developing edge-oriented AI algorithms and implementations (e.g., accelerators) is challenging. In this paper, we summarize our recent efforts for efficient on-device AI development from three aspects, including both training and inference. First, we present on-device training with ultra-low memory usage. We propose a novel rank-adaptive tensor-based tensorized neural network model, which offers orders-of-magnitude memory reduction during training. Second, we introduce an ultra-low bitwidth quantization method for DNN model compression, achieving the state-of-the-art accuracy under the same compression ratio. Third, we introduce an ultra-low latency DNN accelerator design, practicing the software/hardware co-design methodology. This paper emphasizes the importance and efficacy of training, quantization and accelerator design, and calls for more research breakthroughs in the area for AI on the edge.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا