ترغب بنشر مسار تعليمي؟ اضغط هنا

Revamping Storage Class Memory With Hardware Automated Memory-Over-Storage Solution

122   0   0.0 ( 0 )
 نشر من قبل Myoungsoo Jung
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Large persistent memories such as NVDIMM have been perceived as a disruptive memory technology, because they can maintain the state of a system even after a power failure and allow the system to recover quickly. However, overheads incurred by a heavy software-stack intervention seriously negate the benefits of such memories. First, to significantly reduce the software stack overheads, we propose HAMS, a hardware automated Memory-over-Storage (MoS) solution. Specifically, HAMS aggregates the capacity of NVDIMM and ultra-low latency flash archives (ULL-Flash) into a single large memory space, which can be used as a working or persistent memory expansion, in an OS-transparent manner. HAMS resides in the memory controller hub and manages its MoS address pool over conventional DDR and NVMe interfaces; it employs a simple hardware cache to serve all the memory requests from the host MMU after mapping the storage space of ULL-Flash to the memory space of NVDIMM. Second, to make HAMS more energy-efficient and reliable, we propose an advanced HAMS which removes unnecessary data transfers between NVDIMM and ULL-Flash after optimizing the datapath and hardware modules of HAMS. This approach unleashes the ULL-Flash and its NVMe controller from the storage box and directly connects the HAMS datapath to NVDIMM over the conventional DDR4 interface. Our evaluations show that HAMS and advanced HAMS can offer 97% and 119% higher system performance than a software-based hybrid NVDIMM design, while consuming 41% and 45% lower system energy, respectively.



قيم البحث

اقرأ أيضاً

83 - Fei Wen , Mian Qin , Paul Gratz 2020
The current mobile applications have rapidly growing memory footprints, posing a great challenge for memory system design. Insufficient DRAM main memory will incur frequent data swaps between memory and storage, a process that hurts performance, cons umes energy and deteriorates the write endurance of typical flash storage devices. Alternately, a larger DRAM has higher leakage power and drains the battery faster. Further, DRAM scaling trends make further growth of DRAMin the mobile space prohibitive due to cost. Emerging non-volatile memory (NVM) has the potential to alleviate these issues due to its higher capacity per cost than DRAM and mini-mal static power. Recently, a wide spectrum of NVM technologies, including phase-change memories (PCM), memristor, and 3D XPoint have emerged. Despite the mentioned advantages, NVM has longer access latency compared to DRAMand NVM writes can incur higher latencies and wear costs. Therefore integration of these new memory technologies in the memory hierarchy requires a fundamental rearchitect-ing of traditional system designs. In this work, we propose a hardware-accelerated memory manager (HMMU) that addresses both types of memory in a flat space address space. We design a set of data placement and data migration policies within this memory manager, such that we may exploit the advantages of each memory technology. By augmenting the system with this HMMU, we reduce the overall memory latency while also reducing writes to the NVM. Experimental results show that our design achieves a 39% reduction in energy consumption with only a 12% performance degradation versus an all-DRAM baseline that is likely untenable in the future.
134 - Lei Wang , Baowen Li 2008
Memory is an indispensable element for computer besides logic gates. In this Letter we report a model of thermal memory. We demonstrate via numerical simulation that thermal (phononic) information stored in the memory can be retained for a long time without being lost and more importantly can be read out without being destroyed. The possibility of experimental realization is also discussed.
Deep Neural Networks (DNNs) have achieved tremendous success for cognitive applications. The core operation in a DNN is the dot product between quantized inputs and weights. Prior works exploit the weight/input repetition that arises due to quantizat ion to avoid redundant computations in Convolutional Neural Networks (CNNs). However, in this paper we show that their effectiveness is severely limited when applied to Fully-Connected (FC) layers, which are commonly used in state-of-the-art DNNs, as it is the case of modern Recurrent Neural Networks (RNNs) and Transformer models. To improve energy-efficiency of FC computation we present CREW, a hardware accelerator that implements Computation Reuse and an Efficient Weight Storage mechanism to exploit the large number of repeated weights in FC layers. CREW first performs the multiplications of the unique weights by their respective inputs and stores the results in an on-chip buffer. The storage requirements are modest due to the small number of unique weights and the relatively small size of the input compared to convolutional layers. Next, CREW computes each output by fetching and adding its required products. To this end, each weight is replaced offline by an index in the buffer of unique products. Indices are typically smaller than the quantized weights, since the number of unique weights for each input tends to be much lower than the range of quantized weights, which reduces storage and memory bandwidth requirements. Overall, CREW greatly reduces the number of multiplications and provides significant savings in model memory footprint and memory bandwidth usage. We evaluate CREW on a diverse set of modern DNNs. On average, CREW provides 2.61x speedup and 2.42x energy savings over a TPU-like accelerator. Compared to UCNN, a state-of-art computation reuse technique, CREW achieves 2.10x speedup and 2.08x energy savings on average.
Quantum memories for light will be essential elements in future long-range quantum communication networks. These memories operate by reversibly mapping the quantum state of light onto the quantum transitions of a material system. For networks, the qu antum coherence times of these transitions must be long compared to the network transmission times, approximately 100 ms for a global communication network. Due to a lack of a suitable storage material, a quantum memory that operates in the 1550 nm optical fiber communication band with a storage time greater than 1 us has not been demonstrated. Here we describe the spin dynamics of $^{167}$Er$^{3+}:$Y$_{2}$SiO$_{5}$ in a high magnetic field and demonstrate that this material has the characteristics for a practical quantum memory in the 1550 nm communication band. We observe a hyperfine coherence time of 1.3 seconds. Further, we demonstrate efficient optical pumping of the entire ensemble into a single hyperfine state, the first such demonstration in a rare-earth system and a requirement for broadband spin-wave storage. With an absorption of 70 dB/cm at 1538 nm and $Lambda$-transitions enabling spin-wave storage, this material is the first candidate identified for an efficient, broadband quantum memory at telecommunication wavelengths.
153Eu3+:Y2SiO5 is a very attractive candidate for a long lived, multimode quantum memory due to the long spin coherence time (~15 ms), the relatively large hyperfine splitting (100 MHz) and the narrow optical homogeneous linewidth (~100 Hz). Here we show an atomic frequency comb memory with spin wave storage in a promising material 153Eu3+:Y2SiO5, reaching storage times slightly beyond 10 {mu}s. We analyze the efficiency of the storage process and discuss ways of improving it. We also measure the inhomogeneous spin linewidth of 153Eu3+:Y2SiO5, which we find to be 69 pm 3 kHz. These results represent a further step towards realising a long lived multi mode solid state quantum memory.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا