Algorithms for normalized multiple sequence alignments

154 0 0.0 ( 0 )

Download Cite

Added by Fabio Henrique Viduani Martinez

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Eloi Araujo - Luiz Rozante - Diego P. Rubert

Data Structures and Algorithms

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Sequence alignment supports numerous tasks in bioinformatics, natural language processing, pattern recognition, social sciences, and others fields. While the alignment of two sequences may be performed swiftly in many applications, the simultaneous alignment of multiple sequences proved to be naturally more intricate. Although most multiple sequence alignment (MSA) formulations are NP-hard, several approaches have been developed, as they can outperform pairwise alignment methods or are necessary for some applications. Taking into account not only similarities but also the lengths of the compared sequences (i.e. normalization) can provide better alignment results than both unnormalized or post-normalized approaches. While some normalized methods have been developed for pairwise sequence alignment, none have been proposed for MSA. This work is a first effort towards the development of normalized methods for MSA. We discuss multiple aspects of normalized multiple sequence alignment (NMSA). We define three new criteria for computing normalized scores when aligning multiple sequences, showing the NP-hardness and exact algorithms for solving the NMSA using those criteria. In addition, we provide approximation algorithms for MSA and NMSA for some classes of scoring matrices.

rate research

Using Sequence Ensembles for Seeding Alignments of MinION Sequencing Data

68 - Rastislav Rabatin , Brov{n}a Brejova , Tomav{s} Vinav{r} 2016

Oxford Nanopore MinION sequencer is currently the smallest sequencing device available. While being able to produce very long reads (reads of up to 100~kbp were reported), it is prone to high sequencing error rates of up to 30%. Since most of these errors are insertions or deletions, it is very difficult to adapt popular seed-based algorithms designed for aligning data sets with much lower error rates. Base calling of MinION reads is typically done using hidden Markov models. In this paper, we propose to represent each sequencing read by an ensemble of sequences sampled from such a probabilistic model. This approach can improve the sensitivity and false positive rate of seeding an alignment compared to using a single representative base call sequence for each read.

Data Structures and Algorithms Genomics

Efficient Approximation Algorithms for String Kernel Based Sequence Classification

55 - Muhammad Farhan , Juvaria Tariq , Arif Zaman 2017

Sequence classification algorithms, such as SVM, require a definition of distance (similarity) measure between two sequences. A commonly used notion of similarity is the number of matches between $k$-mers ($k$-length subsequences) in the two sequences. Extending this definition, by considering two $k$-mers to match if their distance is at most $m$, yields better classification performance. This, however, makes the problem computationally much more complex. Known algorithms to compute this similarity have computational complexity that render them applicable only for small values of $k$ and $m$. In this work, we develop novel techniques to efficiently and accurately estimate the pairwise similarity score, which enables us to use much larger values of $k$ and $m$, and get higher predictive accuracy. This opens up a broad avenue of applying this classification approach to audio, images, and text sequences. Our algorithm achieves excellent approximation performance with theoretical guarantees. In the process we solve an open combinatorial problem, which was posed as a major hindrance to the scalability of existing solutions. We give analytical bounds on quality and runtime of our algorithm and report its empirical performance on real world biological and music sequences datasets.

Data Structures and Algorithms

Distributed Approximation Algorithms for the Multiple Knapsack Problem

231 - Ananth Murthy , Chandan Yeshwanth , Shrisha Rao 2017

We consider the distributed version of the Multiple Knapsack Problem (MKP), where $m$ items are to be distributed amongst $n$ processors, each with a knapsack. We propose different distributed approximation algorithms with a tradeoff between time and message complexities. The algorithms are based on the greedy approach of assigning the best item to the knapsack with the largest capacity. These algorithms obtain a solution with a bound of $frac{1}{n+1}$ times the optimum solution, with either $mathcal{O}left(mlog nright)$ time and $mathcal{O}left(m nright)$ messages, or $mathcal{O}left(mright)$ time and $mathcal{O}left(mn^{2}right)$ messages.

Data Structures and Algorithms Distributed Parallel and Cluster Computing Discrete Mathematics

Improved Algorithms for Multiple Sink Location Problems in Dynamic Path Networks

380 - Yuya Higashikawa , Mordecai J. Golin , 2014

This paper considers the k-sink location problem in dynamic path networks. In our model, a dynamic path network consists of an undirected path with positive edge lengths, uniform edge capacity, and positive vertex supplies. Here, each vertex supply corresponds to a set of evacuees. Then, the problem requires to find the optimal location of $k$ sinks in a given path so that each evacuee is sent to one of k sinks. Let x denote a k-sink location. Under the optimal evacuation for a given x, there exists a (k-1)-dimensional vector d, called (k-1)-divider, such that each component represents the boundary dividing all evacuees between adjacent two sinks into two groups, i.e., all supplies in one group evacuate to the left sink and all supplies in the other group evacuate to the right sink. Therefore, the goal is to find x and d which minimize the maximum cost or the total cost, which are denoted by the minimax problem and the minisum problem, respectively. We study the k-sink location problem in dynamic path networks with continuous model, and prove that the minimax problem can be solved in O(kn) time and the minisum problem can be solved in O(n^2 min{k, 2^{sqrt{log k log log n}}}) time, where n is the number of vertices in the given network. Note that these improve the previous results by [6].

Data Structures and Algorithms

Normalized Gradient Descent for Variational Quantum Algorithms

90 - Yudai Suzuki , Hiroshi Yano , Rudy Raymond 2021

Variational quantum algorithms (VQAs) are promising methods that leverage noisy quantum computers and classical computing techniques for practical applications. In VQAs, the classical optimizers such as gradient-based optimizers are utilized to adjust the parameters of the quantum circuit so that the objective function is minimized. However, they often suffer from the so-called vanishing gradient or barren plateau issue. On the other hand, the normalized gradient descent (NGD) method, which employs the normalized gradient vector to update the parameters, has been successfully utilized in several optimization problems. Here, we study the performance of the NGD methods in the optimization of VQAs for the first time. Our goal is two-fold. The first is to examine the effectiveness of NGD and its variants for overcoming the vanishing gradient problems. The second is to propose a new NGD that can attain the faster convergence than the ordinary NGD. We performed numerical simulations of these gradient-based optimizers in the context of quantum chemistry where VQAs are used to find the ground state of a given Hamiltonian. The results show the effective convergence property of the NGD methods in VQAs, compared to the relevant optimizers without normalization. Moreover, we make use of some normalized gradient vectors at the past iteration steps to propose the novel historical NGD that has a theoretical guarantee to accelerate the convergence speed, which is observed in the numerical experiments as well.

Quantum Physics