Improving Nanopore Reads Raw Signal Alignment

96 0 0.0 ( 0 )

Download Cite

Added by Vladimir Boza

Publication date 2017

fields Biology

and research's language is English

Authors Vladimir Bov{z}a - Brov{n}a Brejova - Tomav{s} Vinav{r}

Quantitative Methods

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We investigate usage of dynamic time warping (DTW) algorithm for aligning raw signal data from MinION sequencer. DTW is mostly using for fast alignment for selective sequencing to quickly determine whether a read comes from sequence of interest. We show that standard usage of DTW has low discriminative power mainly due to problem with accurate estimation of scaling parameters. We propose a simple variation of DTW algorithm, which does not suffer from scaling problems and has much higher discriminative power.

rate research

DeepNano: Deep Recurrent Neural Networks for Base Calling in MinION Nanopore Reads

87 - Vladimir Bov{z}a , Brov{n}a Brejova , Tomav{s} Vinav{r} 2016

Motivation: The MinION device by Oxford Nanopore is the first portable sequencing device. MinION is able to produce very long reads (reads over 100~kBp were reported), however it suffers from high sequencing error rate. In this paper, we show that the error rate can be reduced by improving the base calling process. Results: We present the first open-source DNA base caller for the MinION sequencing platform by Oxford Nanopore. By employing carefully crafted recurrent neural networks, our tool improves the base calling accuracy compared to the default base caller supplied by the manufacturer. This advance may further enhance applicability of MinION for genome sequencing and various clinical applications. Availability: DeepNano can be downloaded at http://compbio.fmph.uniba.sk/deepnano/. Contact: [email protected]

Genomics

benchNGS : An approach to benchmark short reads alignment tools

527 - Farzana Rahman , Mehedi Hassan , Alona Kryshchenko 2015

In the last decade a number of algorithms and associated software have been developed to align next generation sequencing (NGS) reads with relevant reference genomes. The accuracy of these programs may vary significantly, especially when the NGS reads are quite different from the available reference genome. We propose a benchmark to assess accuracy of short reads mapping based on the pre-computed global alignment of related genome sequences. In this paper we propose a benchmark to assess accuracy of the short reads mapping based on the pre-computed global alignment of closely related genome sequences. We outline the method and also present a short report of an experiment performed on five popular alignment tools based on the pairwise alignments of Escherichia coli O6 CFT073 genome with genomes of seven other bacteria.

Genomics

Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms

421 - Rob Patro Lanen Center for Computational Biology 2013

RNA-seq has rapidly become the de facto technique to measure gene expression. However, the time required for analysis has not kept up with the pace of data generation. Here we introduce Sailfish, a novel computational method for quantifying the abundance of previously annotated RNA isoforms from RNA-seq data. Sailfish entirely avoids mapping reads, which is a time-consuming step in all current methods. Sailfish provides quantification estimates much faster than existing approaches (typically 20-times faster) without loss of accuracy.

Genomics Computational Engineering

On subset seeds for protein alignment

342 - Mikhail A. Roytberg , Anna Gambin , Laurent Noe (LIFL 2009

We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets. We then perform a comparative analysis of seeds built over those alphabets and compare them with the standard BLASTP seeding method [2], [3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seeds is less expressive (but less costly to implement) than the cumulative principle used in BLASTP and vector seeds, our seeds show a similar or even better performance than BLASTP on Bernoulli models of proteins compatible with the common BLOSUM62 matrix. Finally, we perform a large-scale benchmarking of our seeds against several main databases of protein alignments. Here again, the results show a comparable or better performance of our seeds vs. BLASTP.

Quantitative Methods

Probabilistic Approaches to Alignment with Tandem Repeats

494 - Michal Nanasi , Tomav{s} Vinav{r} , 2013

We propose a simple tractable pair hidden Markov model for pairwise sequence alignment that accounts for the presence of short tandem repeats. Using the framework of gain functions, we design several optimization criteria for decoding this model and describe the resulting decoding algorithms, ranging from the traditional Viterbi and posterior decoding to block-based decoding algorithms specialized for our model. We compare the accuracy of individual decoding algorithms on simulated data and find our approach superior to the classical three-state pair HMM in simulations.

Quantitative Methods Genomics