Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

SPATA: A Seeding and Patching Algorithm for Hybrid Transcriptome Assembly

451 0 0.0 ( 0 )

Download Cite

Added by Tin Nguyen

Publication date 2013

fields Informatics Engineering Biology

and research's language is English

Authors Tin Chi Nguyen - Zhiyu Zhao - Dongxiao Zhu

Computational Engineering Genomics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Transcriptome assembly from RNA-Seq reads is an active area of bioinformatics research. The ever-declining cost and the increasing depth of RNA-Seq have provided unprecedented opportunities to better identify expressed transcripts. However, the nonlinear transcript structures and the ultra-high throughput of RNA-Seq reads pose significant algorithmic and computational challenges to the existing transcriptome assembly approaches, either reference-guided or de novo. While reference-guided approaches offer good sensitivity, they rely on alignment results of the splice-aware aligners and are thus unsuitable for species with incomplete reference genomes. In contrast, de novo approaches do not depend on the reference genome but face a computational daunting task derived from the complexity of the graph built for the whole transcriptome. In response to these challenges, we present a hybrid approach to exploit an incomplete reference genome without relying on splice-aware aligners. We have designed a split-and-align procedure to efficiently localize the reads to individual genomic loci, which is followed by an accurate de novo assembly to assemble reads falling into each locus. Using extensive simulation data, we demonstrate a high accuracy and precision in transcriptome reconstruction by comparing to selected transcriptome assembly tools. Our method is implemented in assemblySAM, a GUI software freely available at http://sammate.sourceforge.net.

rate research

Unsupervised Classification for Tiling Arrays: ChIP-chip and Transcriptome

495 - Caroline Berard , Marie-Laure Martin-Magniette , Veronique Brunaud 2011

Tiling arrays make possible a large scale exploration of the genome thanks to probes which cover the whole genome with very high density until 2 000 000 probes. Biological questions usually addressed are either the expression difference between two conditions or the detection of transcribed regions. In this work we propose to consider simultaneously both questions as an unsupervised classification problem by modeling the joint distribution of the two conditions. In contrast to previous methods, we account for all available information on the probes as well as biological knowledge like annotation and spatial dependence between probes. Since probes are not biologically relevant units we propose a classification rule for non-connected regions covered by several probes. Applications to transcriptomic and ChIP-chip data of Arabidopsis thaliana obtained with a NimbleGen tiling array highlight the importance of a precise modeling and the region classification.

Methodology Genomics Quantitative Methods

Using Sequence Ensembles for Seeding Alignments of MinION Sequencing Data

68 - Rastislav Rabatin , Brov{n}a Brejova , Tomav{s} Vinav{r} 2016

Oxford Nanopore MinION sequencer is currently the smallest sequencing device available. While being able to produce very long reads (reads of up to 100~kbp were reported), it is prone to high sequencing error rates of up to 30%. Since most of these errors are insertions or deletions, it is very difficult to adapt popular seed-based algorithms designed for aligning data sets with much lower error rates. Base calling of MinION reads is typically done using hidden Markov models. In this paper, we propose to represent each sequencing read by an ensemble of sequences sampled from such a probabilistic model. This approach can improve the sensitivity and false positive rate of seeding an alignment compared to using a single representative base call sequence for each read.

Data Structures and Algorithms Genomics

Binary Particle Swarm Optimization versus Hybrid Genetic Algorithm for Inferring Well Supported Phylogenetic Trees

67 - Bassam AlKindy , Bashar Al-Nuaimi , Christophe Guyeux 2016

The amount of completely sequenced chloroplast genomes increases rapidly every day, leading to the possibility to build large-scale phylogenetic trees of plant species. Considering a subset of close plant species defined according to their chloroplasts, the phylogenetic tree that can be inferred by their core genes is not necessarily well supported, due to the possible occurrence of problematic genes (i.e., homoplasy, incomplete lineage sorting, horizontal gene transfers, etc.) which may blur the phylogenetic signal. However, a trustworthy phylogenetic tree can still be obtained provided such a number of blurring genes is reduced. The problem is thus to determine the largest subset of core genes that produces the best-supported tree. To discard problematic genes and due to the overwhelming number of possible combinations, this article focuses on how to extract the largest subset of sequences in order to obtain the most supported species tree. Due to computational complexity, a distributed Binary Particle Swarm Optimization (BPSO) is proposed in sequential and distributed fashions. Obtained results from bo

Artificial Intelligence Genomics

Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly

320 - Giulia Guidi , Oguz Selvitopi , Marquita Ellis 2020

One of the most computationally intensive tasks in computational biology is de novo genome assembly, the decoding of the sequence of an unknown genome from redundant and erroneous short sequences. A common assembly paradigm identifies overlapping sequences, simplifies their layout, and creates consensus. Despite many algorithms developed in the literature, the efficient assembly of large genomes is still an open problem. In this work, we introduce new distributed-memory parallel algorithms for overlap detection and layout simplification steps of de novo genome assembly, and implement them in the diBELLA 2D pipeline. Our distributed memory algorithms for both overlap detection and layout simplification are based on linear-algebra operations over semirings using 2D distributed sparse matrices. Our layout step consists of performing a transitive reduction from the overlap graph to a string graph. We provide a detailed communication analysis of the main stages of our new algorithms. diBELLA 2D achieves near linear scaling with over 80% parallel efficiency for the human genome, reducing the runtime for overlap detection by 1.2-1.3x for the human genome and 1.5-1.9x for C. elegans compared to the state-of-the-art. Our transitive reduction algorithm outperforms an existing distributed-memory implementation by 10.5-13.3x for the human genome and 18-29x for the C. elegans. Our work paves the way for efficient de novo assembly of large genomes using long reads in distributed memory.

Distributed Parallel and Cluster Computing Genomics

Analytical Study of Hexapod miRNAs using Phylogenetic Methods

303 - A.K. Mishra , H. Chandrasekharan 2012

MicroRNAs (miRNAs) are a class of non-coding RNAs that regulate gene expression. Identification of total number of miRNAs even in completely sequenced organisms is still an open problem. However, researchers have been using techniques that can predict limited number of miRNA in an organism. In this paper, we have used homology based approach for comparative analysis of miRNA of hexapoda group .We have used Apis mellifera, Bombyx mori, Anopholes gambiae and Drosophila melanogaster miRNA datasets from miRBase repository. We have done pair wise as well as multiple alignments for the available miRNAs in the repository to identify and analyse conserved regions among related species. Unfortunately, to the best of our knowledge, miRNA related literature does not provide in depth analysis of hexapods. We have made an attempt to derive the commonality among the miRNAs and to identify the conserved regions which are still not available in miRNA repositories. The results are good approximation with a small number of mismatches. However, they are encouraging and may facilitate miRNA biogenesis for

Computational Engineering Genomics

comments

Fetching comments

Oran 1 University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

SPATA: A Seeding and Patching Algorithm for Hybrid Transcriptome Assembly

Ask ChatGPT about the research

No Arabic abstract

Read More