بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

SPATA: A Seeding and Patching Algorithm for Hybrid Transcriptome Assembly

429 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Tin Nguyen

تاريخ النشر 2013

مجال البحث الهندسة المعلوماتية علم الأحياء

والبحث باللغة English

تأليف Tin Chi Nguyen - Zhiyu Zhao - Dongxiao Zhu

الهندسة الحاسوبية، المالية،العلوم الجينوم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Transcriptome assembly from RNA-Seq reads is an active area of bioinformatics research. The ever-declining cost and the increasing depth of RNA-Seq have provided unprecedented opportunities to better identify expressed transcripts. However, the nonlinear transcript structures and the ultra-high throughput of RNA-Seq reads pose significant algorithmic and computational challenges to the existing transcriptome assembly approaches, either reference-guided or de novo. While reference-guided approaches offer good sensitivity, they rely on alignment results of the splice-aware aligners and are thus unsuitable for species with incomplete reference genomes. In contrast, de novo approaches do not depend on the reference genome but face a computational daunting task derived from the complexity of the graph built for the whole transcriptome. In response to these challenges, we present a hybrid approach to exploit an incomplete reference genome without relying on splice-aware aligners. We have designed a split-and-align procedure to efficiently localize the reads to individual genomic loci, which is followed by an accurate de novo assembly to assemble reads falling into each locus. Using extensive simulation data, we demonstrate a high accuracy and precision in transcriptome reconstruction by comparing to selected transcriptome assembly tools. Our method is implemented in assemblySAM, a GUI software freely available at http://sammate.sourceforge.net.

قيم البحث

485 - Caroline Berard , Marie-Laure Martin-Magniette , Veronique Brunaud 2011

Tiling arrays make possible a large scale exploration of the genome thanks to probes which cover the whole genome with very high density until 2 000 000 probes. Biological questions usually addressed are either the expression difference between two c onditions or the detection of transcribed regions. In this work we propose to consider simultaneously both questions as an unsupervised classification problem by modeling the joint distribution of the two conditions. In contrast to previous methods, we account for all available information on the probes as well as biological knowledge like annotation and spatial dependence between probes. Since probes are not biologically relevant units we propose a classification rule for non-connected regions covered by several probes. Applications to transcriptomic and ChIP-chip data of Arabidopsis thaliana obtained with a NimbleGen tiling array highlight the importance of a precise modeling and the region classification.

المنهجية الجينوم الأساليب الكمية

Using Sequence Ensembles for Seeding Alignments of MinION Sequencing Data

68 - Rastislav Rabatin , Brov{n}a Brejova , Tomav{s} Vinav{r} 2016

Oxford Nanopore MinION sequencer is currently the smallest sequencing device available. While being able to produce very long reads (reads of up to 100~kbp were reported), it is prone to high sequencing error rates of up to 30%. Since most of these e rrors are insertions or deletions, it is very difficult to adapt popular seed-based algorithms designed for aligning data sets with much lower error rates. Base calling of MinION reads is typically done using hidden Markov models. In this paper, we propose to represent each sequencing read by an ensemble of sequences sampled from such a probabilistic model. This approach can improve the sensitivity and false positive rate of seeding an alignment compared to using a single representative base call sequence for each read.

بنى وهياكل البيانات والخوارزميات الجينوم

Binary Particle Swarm Optimization versus Hybrid Genetic Algorithm for Inferring Well Supported Phylogenetic Trees

67 - Bassam AlKindy , Bashar Al-Nuaimi , Christophe Guyeux 2016

The amount of completely sequenced chloroplast genomes increases rapidly every day, leading to the possibility to build large-scale phylogenetic trees of plant species. Considering a subset of close plant species defined according to their chloroplas ts, the phylogenetic tree that can be inferred by their core genes is not necessarily well supported, due to the possible occurrence of problematic genes (i.e., homoplasy, incomplete lineage sorting, horizontal gene transfers, etc.) which may blur the phylogenetic signal. However, a trustworthy phylogenetic tree can still be obtained provided such a number of blurring genes is reduced. The problem is thus to determine the largest subset of core genes that produces the best-supported tree. To discard problematic genes and due to the overwhelming number of possible combinations, this article focuses on how to extract the largest subset of sequences in order to obtain the most supported species tree. Due to computational complexity, a distributed Binary Particle Swarm Optimization (BPSO) is proposed in sequential and distributed fashions. Obtained results from bo

الذكاء الاصطناعي الجينوم

Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly

320 - Giulia Guidi , Oguz Selvitopi , Marquita Ellis 2020

One of the most computationally intensive tasks in computational biology is de novo genome assembly, the decoding of the sequence of an unknown genome from redundant and erroneous short sequences. A common assembly paradigm identifies overlapping seq uences, simplifies their layout, and creates consensus. Despite many algorithms developed in the literature, the efficient assembly of large genomes is still an open problem. In this work, we introduce new distributed-memory parallel algorithms for overlap detection and layout simplification steps of de novo genome assembly, and implement them in the diBELLA 2D pipeline. Our distributed memory algorithms for both overlap detection and layout simplification are based on linear-algebra operations over semirings using 2D distributed sparse matrices. Our layout step consists of performing a transitive reduction from the overlap graph to a string graph. We provide a detailed communication analysis of the main stages of our new algorithms. diBELLA 2D achieves near linear scaling with over 80% parallel efficiency for the human genome, reducing the runtime for overlap detection by 1.2-1.3x for the human genome and 1.5-1.9x for C. elegans compared to the state-of-the-art. Our transitive reduction algorithm outperforms an existing distributed-memory implementation by 10.5-13.3x for the human genome and 18-29x for the C. elegans. Our work paves the way for efficient de novo assembly of large genomes using long reads in distributed memory.

النظم الموزعة والتوازية والحوسبة العنقودية الجينوم

Analytical Study of Hexapod miRNAs using Phylogenetic Methods

287 - A.K. Mishra , H. Chandrasekharan 2012

MicroRNAs (miRNAs) are a class of non-coding RNAs that regulate gene expression. Identification of total number of miRNAs even in completely sequenced organisms is still an open problem. However, researchers have been using techniques that can predic t limited number of miRNA in an organism. In this paper, we have used homology based approach for comparative analysis of miRNA of hexapoda group .We have used Apis mellifera, Bombyx mori, Anopholes gambiae and Drosophila melanogaster miRNA datasets from miRBase repository. We have done pair wise as well as multiple alignments for the available miRNAs in the repository to identify and analyse conserved regions among related species. Unfortunately, to the best of our knowledge, miRNA related literature does not provide in depth analysis of hexapods. We have made an attempt to derive the commonality among the miRNAs and to identify the conserved regions which are still not available in miRNA repositories. The results are good approximation with a small number of mismatches. However, they are encouraging and may facilitate miRNA biogenesis for

الهندسة الحاسوبية، المالية،العلوم الجينوم

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة سوهاج

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

SPATA: A Seeding and Patching Algorithm for Hybrid Transcriptome Assembly

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً