ترغب بنشر مسار تعليمي؟ اضغط هنا

Read Mapping Near Non-Volatile Memory

87   0   0.0 ( 0 )
 نشر من قبل S. Karen Khatamifard
 تاريخ النشر 2017
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

DNA sequencing is the physical/biochemical process of identifying the location of the four bases (Adenine, Guanine, Cytosine, Thymine) in a DNA strand. As semiconductor technology revolutionized computing, modern DNA sequencing technology (termed Next Generation Sequencing, NGS)revolutionized genomic research. As a result, modern NGS platforms can sequence hundreds of millions of short DNA fragments in parallel. The sequenced DNA fragments, representing the output of NGS platforms, are termed reads. Besides genomic variations, NGS imperfections induce noise in reads. Mapping each read to (the most similar portion of) a reference genome of the same species, i.e., read mapping, is a common critical first step in a diverse set of emerging bioinformatics applications. Mapping represents a search-heavy memory-intensive similarity matching problem, therefore, can greatly benefit from near-memory processing. Intuition suggests using fast associative search enabled by Ternary Content Addressable Memory (TCAM) by construction. However, the excessive energy consumption and lack of support for similarity matching (under NGS and genomic variation induced noise) renders direct application of TCAM infeasible, irrespective of volatility, where only non-volatile TCAM can accommodate the large memory footprint in an area-efficient way. This paper introduces GeNVoM, a scalable, energy-efficient and high-throughput solution. Instead of optimizing an algorithm developed for general-purpose computers or GPUs, GeNVoM rethinks the algorithm and non-volatile TCAM-based accelerator design together from the ground up. Thereby GeNVoM can improve the throughput by up to 113.5 times (3.6); the energy consumption, by up to 210.9 times (1.36), when compared to a GPU (accelerator) baseline, which represents one of the highest-throughput implementations known.

قيم البحث

اقرأ أيضاً

Personalized recommendation systems leverage deep learning models and account for the majority of data center AI cycles. Their performance is dominated by memory-bound sparse embedding operations with unique irregular memory access patterns that pose a fundamental challenge to accelerate. This paper proposes a lightweight, commodity DRAM compliant, near-memory processing solution to accelerate personalized recommendation inference. The in-depth characterization of production-grade recommendation models shows that embedding operations with high model-, operator- and data-level parallelism lead to memory bandwidth saturation, limiting recommendation inference performance. We propose RecNMP which provides a scalable solution to improve system throughput, supporting a broad range of sparse embedding models. RecNMP is specifically tailored to production environments with heavy co-location of operators on a single server. Several hardware/software co-optimization techniques such as memory-side caching, table-aware packet scheduling, and hot entry profiling are studied, resulting in up to 9.8x memory latency speedup over a highly-optimized baseline. Overall, RecNMP offers 4.2x throughput improvement and 45.8% memory energy savings.
Flat combining (FC) is a synchronization paradigm in which a single thread, holding a global lock, collects requests by multiple threads for accessing a concurrent data structure and applies their combined requests to it. Although FC is sequential, i t significantly reduces synchronization overheads and cache invalidations and thus often provides better performance than that of lock-free implementations. The recent emergence of non-volatile memory (NVM) technologies increases the interest in the development of persistent (a.k.a. durable or recoverable) objects. These are objects that are able to recover from system failures and ensure consistency by retaining their state in NVM and fixing it, if required, upon recovery. Of particular interest are detectable objects that, in addition to ensuring consistency, allow recovery code to infer if a failed operation took effect before the crash and, if it did, obtain its response. In this work, we present the first FC-based persistent object. Specifically, we introduce a detectable FC-based implementation of a concurrent LIFO stack object. Our empirical evaluation establishes that thanks to the usage of flat combining, the novel stack algorithm requires a much smaller number of costly persistence instructions than competing algorithms and is therefore able to significantly outperform them.
78 - Haikun Liu , Di Chen , Hai Jin 2020
Non-Volatile Main Memories (NVMMs) have recently emerged as promising technologies for future memory systems. Generally, NVMMs have many desirable properties such as high density, byte-addressability, non-volatility, low cost, and energy efficiency, at the expense of high write latency, high write power consumption and limited write endurance. NVMMs have become a competitive alternative of Dynamic Random Access Memory (DRAM), and will fundamentally change the landscape of memory systems. They bring many research opportunities as well as challenges on system architectural designs, memory management in operating systems (OSes), and programming models for hybrid memory systems. In this article, we first revisit the landscape of emerging NVMM technologies, and then survey the state-of-the-art studies of NVMM technologies. We classify those studies with a taxonomy according to different dimensions such as memory architectures, data persistence, performance improvement, energy saving, and wear leveling. Second, to demonstrate the best practices in building NVMM systems, we introduce our recent work of hybrid memory system designs from the dimensions of architectures, systems, and applications. At last, we present our vision of future research directions of NVMMs and shed some light on design challenges and opportunities.
The never-ending demand for high performance and energy efficiency is pushing designers towards an increasing level of heterogeneity and specialization in modern computing systems. In such systems, creating efficient memory architectures is one of th e major opportunities for optimizing modern workloads (e.g., computer vision, machine learning, graph analytics, etc.) that are extremely data-driven. However, designers demand proper design methods to tackle the increasing design complexity and address several new challenges, like the security and privacy of the data to be elaborated. This paper overviews the current trend for the design of domain-specific memory architectures. Domain-specific architectures are tailored for the given application domain, with the introduction of hardware accelerators and custom memory modules while maintaining a certain level of flexibility. We describe the major components, the common challenges, and the state-of-the-art design methodologies for building domain-specific memory architectures. We also discuss the most relevant research projects, providing a classification based on our main topics.
Plenty of research efforts have been devoted to FPGA-based acceleration, due to its low latency and high energy efficiency. However, using the original low-level hardware description languages like Verilog to program FPGAs requires generally good kno wledge of hardware design details and hand-on experiences. Fortunately, the FPGA community intends to address this low programmability issues. For example, , with the intention that programming FPGAs is just as easy as programming GPUs. Even though Vitis is proven to increase programmability, we cannot directly obtain high performance without careful design regarding hardware pipeline and memory subsystem.In this paper, we focus on the memory subsystem, comprehensively and systematically benchmarking the effect of optimization methods on memory performance. Upon benchmarking, we quantitatively analyze the typical memory access patterns for a broad range of applications, including AI, HPC, and database. Further, we also provide the corresponding optimization direction for each memory access pattern so as to improve overall performance.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا