بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping Using Processing-in-Memory Technologies

70 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jeremie Kim

تاريخ النشر 2017

مجال البحث علم الأحياء الهندسة المعلوماتية

والبحث باللغة English

تأليف Jeremie S. Kim - Damla Senol Cali - Hongyi Xin

الجينوم الهندسة الحاسوبية، المالية،العلوم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Motivation: Seed location filtering is critical in DNA read mapping, a process where billions of DNA fragments (reads) sampled from a donor are mapped onto a reference genome to identify genomic variants of the donor. State-of-the-art read mappers 1) quickly generate possible mapping locations for seeds (i.e., smaller segments) within each read, 2) extract reference sequences at each of the mapping locations, and 3) check similarity between each read and its associated reference sequences with a computationally-expensive algorithm (i.e., sequence alignment) to determine the origin of the read. A seed location filter comes into play before alignment, discarding seed locations that alignment would deem a poor match. The ideal seed location filter would discard all poor match locations prior to alignment such that there is no wasted computation on unnecessary alignments. Results: We propose a novel seed location filtering algorithm, GRIM-Filter, optimized to exploit 3D-stacked memory systems that integrate computation within a logic layer stacked under memory layers, to perform processing-in-memory (PIM). GRIM-Filter quickly filters seed locations by 1) introducing a new representation of coarse-grained segments of the reference genome, and 2) using massively-parallel in-memory operations to identify read presence within each coarse-grained segment. Our evaluations show that for a sequence alignment error tolerance of 0.05, GRIM-Filter 1) reduces the false negative rate of filtering by 5.59x--6.41x, and 2) provides an end-to-end read mapper speedup of 1.81x--3.65x, compared to a state-of-the-art read mapper employing the best previous seed location filtering algorithm. Availability: The code is available online at: https://github.com/CMU-SAFARI/GRIM

قيم البحث

88 - Jeremie S Kim , Damla Senol , Hongyi Xin 2017

Motivation: Seed filtering is critical in DNA read mapping, a process where billions of DNA fragments (reads) sampled from a donor are mapped onto a reference genome to identify genomic variants of the donor. Read mappers 1) quickly generate possible mapping locations (i.e., seeds) for each read, 2) extract reference sequences at each of the mapping locations, and then 3) check similarity between each read and its associated reference sequences with a computationally expensive dynamic programming algorithm (alignment) to determine the origin of the read. Location filters come into play before alignment, discarding seed locations that alignment would have deemed a poor match. The ideal location filter would discard all poor matching locations prior to alignment such that there is no wasted computation on poor alignments. Results: We propose a novel filtering algorithm, GRIM-Filter, optimized to exploit emerging 3D-stacked memory systems that integrate computation within a stacked logic layer, enabling processing-in-memory (PIM). GRIM-Filter quickly filters locations by 1) introducing a new representation of coarse-grained segments of the reference genome and 2) using massively-parallel in-memory operations to identify read presence within each coarse-grained segment. Our evaluations show that for 5% error acceptance rates, GRIM-Filter eliminates 5.59x-6.41x more false negatives and exhibits end-to-end speedups of 1.81x-3.65x compared to mappers employing the best previous filtering algorithm.

الجينوم

Identification of repeats in DNA sequences using nucleotide distribution uniformity

146 - Changchuan Yin 2016

Repetitive elements are important in genomic structures, functions and regulations, yet effective methods in precisely identifying repetitive elements in DNA sequences are not fully accessible, and the relationship between repetitive elements and per iodicities of genomes is not clearly understood. We present an $textit{ab initio}$ method to quantitatively detect repetitive elements and infer the consensus repeat pattern in repetitive elements. The method uses the measure of the distribution uniformity of nucleotides at periodic positions in DNA sequences or genomes. It can identify periodicities, consensus repeat patterns, copy numbers and perfect levels of repetitive elements. The results of using the method on different DNA sequences and genomes demonstrate efficacy and accuracy in identifying repeat patterns and periodicities. The complexity of the method is linear with respect to the lengths of the analyzed sequences.

الجينوم الهندسة الحاسوبية، المالية،العلوم الرؤية الحاسوبية وتمييز الأنماط

Optimal Seed Solver: Optimizing Seed Selection in Read Mapping

544 - Hongyi Xin , Richard Zhu , Sunny Nahar 2015

Motivation: Optimizing seed selection is an important problem in read mapping. The number of non-overlapping seeds a mapper selects determines the sensitivity of the mapper while the total frequency of all selected seeds determines the speed of the m apper. Modern seed-and-extend mappers usually select seeds with either an equal and fixed-length scheme or with an inflexible placement scheme, both of which limit the potential of the mapper to select less frequent seeds to speed up the mapping process. Therefore, it is crucial to develop a new algorithm that can adjust both the individual seed length and the seed placement, as well as derive less frequent seeds. Results: We present the Optimal Seed Solver (OSS), a dynamic programming algorithm that discovers the least frequently-occurring set of x seeds in an L-bp read in $O(x times L)$ operations on average and in $O(x times L^{2})$ operations in the worst case. We compared OSS against four state-of-the-art seed selection schemes and observed that OSS provides a 3-fold reduction of average seed frequency over the best previous seed selection optimizations.

الهندسة الحاسوبية، المالية،العلوم الجينوم

MEEPTOOLS: A maximum expected error based FASTQ read filtering and trimming toolkit

50 - Vishal N. Koparde , Hardik I. Parikh , Steven P. Bradley 2015

Next generation sequencing technology rapidly produces massive volume of data and quality control of this sequencing data is essential to any genomic analysis. Here we present MEEPTOOLS, which is a collection of open-source tools based on maximum exp ected error as a percentage of read length (MEEP score) to filter, trim, truncate and assess next generation DNA sequencing data in FASTQ file format. MEEPTOOLS provides a non-traditional approach towards read filtering/trimming based on maximum error probabilities of the bases in the read on a non-logarithmic scale. This method simultaneously retains more reliable bases and removes more unreliable bases than the traditional quality filtering strategies.

الجينوم

BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies

540 - Xiang Wan , Can Yang , Qiang Yang 2010

Gene-gene interactions have long been recognized to be fundamentally important to understand genetic causes of complex disease traits. At present, identifying gene-gene interactions from genome-wide case-control studies is computationally and methodo logically challenging. In this paper, we introduce a simple but powerful method, named `BOolean Operation based Screening and Testing(BOOST). To discover unknown gene-gene interactions that underlie complex diseases, BOOST allows examining all pairwise interactions in genome-wide case-control studies in a remarkably fast manner. We have carried out interaction analyses on seven data sets from the Wellcome Trust Case Control Consortium (WTCCC). Each analysis took less than 60 hours on a standard 3.0 GHz desktop with 4G memory running Windows XP system. The interaction patterns identified from the type 1 diabetes data set display significant difference from those identified from the rheumatoid arthritis data set, while both data sets share a very similar hit region in the WTCCC report. BOOST has also identified many undiscovered interactions between genes in the major histocompatibility complex (MHC) region in the type 1 diabetes data set. In the coming era of large-scale interaction mapping in genome-wide case-control studies, our method can serve as a computationally and statistically useful tool.

الجينوم الهندسة الحاسوبية، المالية،العلوم الأساليب الكمية

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الأكاديمية العربية للعلوم والتكنولوجيا والنقل البحري

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping Using Processing-in-Memory Technologies

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً