ترغب بنشر مسار تعليمي؟ اضغط هنا

Algorithms for normalized multiple sequence alignments

154   0   0.0 ( 0 )
 نشر من قبل Fabio Henrique Viduani Martinez
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Sequence alignment supports numerous tasks in bioinformatics, natural language processing, pattern recognition, social sciences, and others fields. While the alignment of two sequences may be performed swiftly in many applications, the simultaneous alignment of multiple sequences proved to be naturally more intricate. Although most multiple sequence alignment (MSA) formulations are NP-hard, several approaches have been developed, as they can outperform pairwise alignment methods or are necessary for some applications. Taking into account not only similarities but also the lengths of the compared sequences (i.e. normalization) can provide better alignment results than both unnormalized or post-normalized approaches. While some normalized methods have been developed for pairwise sequence alignment, none have been proposed for MSA. This work is a first effort towards the development of normalized methods for MSA. We discuss multiple aspects of normalized multiple sequence alignment (NMSA). We define three new criteria for computing normalized scores when aligning multiple sequences, showing the NP-hardness and exact algorithms for solving the NMSA using those criteria. In addition, we provide approximation algorithms for MSA and NMSA for some classes of scoring matrices.



قيم البحث

اقرأ أيضاً

Oxford Nanopore MinION sequencer is currently the smallest sequencing device available. While being able to produce very long reads (reads of up to 100~kbp were reported), it is prone to high sequencing error rates of up to 30%. Since most of these e rrors are insertions or deletions, it is very difficult to adapt popular seed-based algorithms designed for aligning data sets with much lower error rates. Base calling of MinION reads is typically done using hidden Markov models. In this paper, we propose to represent each sequencing read by an ensemble of sequences sampled from such a probabilistic model. This approach can improve the sensitivity and false positive rate of seeding an alignment compared to using a single representative base call sequence for each read.
Sequence classification algorithms, such as SVM, require a definition of distance (similarity) measure between two sequences. A commonly used notion of similarity is the number of matches between $k$-mers ($k$-length subsequences) in the two sequence s. Extending this definition, by considering two $k$-mers to match if their distance is at most $m$, yields better classification performance. This, however, makes the problem computationally much more complex. Known algorithms to compute this similarity have computational complexity that render them applicable only for small values of $k$ and $m$. In this work, we develop novel techniques to efficiently and accurately estimate the pairwise similarity score, which enables us to use much larger values of $k$ and $m$, and get higher predictive accuracy. This opens up a broad avenue of applying this classification approach to audio, images, and text sequences. Our algorithm achieves excellent approximation performance with theoretical guarantees. In the process we solve an open combinatorial problem, which was posed as a major hindrance to the scalability of existing solutions. We give analytical bounds on quality and runtime of our algorithm and report its empirical performance on real world biological and music sequences datasets.
We consider the distributed version of the Multiple Knapsack Problem (MKP), where $m$ items are to be distributed amongst $n$ processors, each with a knapsack. We propose different distributed approximation algorithms with a tradeoff between time and message complexities. The algorithms are based on the greedy approach of assigning the best item to the knapsack with the largest capacity. These algorithms obtain a solution with a bound of $frac{1}{n+1}$ times the optimum solution, with either $mathcal{O}left(mlog nright)$ time and $mathcal{O}left(m nright)$ messages, or $mathcal{O}left(mright)$ time and $mathcal{O}left(mn^{2}right)$ messages.
This paper considers the k-sink location problem in dynamic path networks. In our model, a dynamic path network consists of an undirected path with positive edge lengths, uniform edge capacity, and positive vertex supplies. Here, each vertex supply c orresponds to a set of evacuees. Then, the problem requires to find the optimal location of $k$ sinks in a given path so that each evacuee is sent to one of k sinks. Let x denote a k-sink location. Under the optimal evacuation for a given x, there exists a (k-1)-dimensional vector d, called (k-1)-divider, such that each component represents the boundary dividing all evacuees between adjacent two sinks into two groups, i.e., all supplies in one group evacuate to the left sink and all supplies in the other group evacuate to the right sink. Therefore, the goal is to find x and d which minimize the maximum cost or the total cost, which are denoted by the minimax problem and the minisum problem, respectively. We study the k-sink location problem in dynamic path networks with continuous model, and prove that the minimax problem can be solved in O(kn) time and the minisum problem can be solved in O(n^2 min{k, 2^{sqrt{log k log log n}}}) time, where n is the number of vertices in the given network. Note that these improve the previous results by [6].
Variational quantum algorithms (VQAs) are promising methods that leverage noisy quantum computers and classical computing techniques for practical applications. In VQAs, the classical optimizers such as gradient-based optimizers are utilized to adjus t the parameters of the quantum circuit so that the objective function is minimized. However, they often suffer from the so-called vanishing gradient or barren plateau issue. On the other hand, the normalized gradient descent (NGD) method, which employs the normalized gradient vector to update the parameters, has been successfully utilized in several optimization problems. Here, we study the performance of the NGD methods in the optimization of VQAs for the first time. Our goal is two-fold. The first is to examine the effectiveness of NGD and its variants for overcoming the vanishing gradient problems. The second is to propose a new NGD that can attain the faster convergence than the ordinary NGD. We performed numerical simulations of these gradient-based optimizers in the context of quantum chemistry where VQAs are used to find the ground state of a given Hamiltonian. The results show the effective convergence property of the NGD methods in VQAs, compared to the relevant optimizers without normalization. Moreover, we make use of some normalized gradient vectors at the past iteration steps to propose the novel historical NGD that has a theoretical guarantee to accelerate the convergence speed, which is observed in the numerical experiments as well.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا