Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

A Fast Randomized Algorithm for Finding the Maximal Common Subsequences

176 0 0.0 ( 0 )

Download Cite

Added by Jin Cao

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Jin Cao - Dewei Zhong

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Finding the common subsequences of $L$ multiple strings has many applications in the area of bioinformatics, computational linguistics, and information retrieval. A well-known result states that finding a Longest Common Subsequence (LCS) for $L$ strings is NP-hard, e.g., the computational complexity is exponential in $L$. In this paper, we develop a randomized algorithm, referred to as {em Random-MCS}, for finding a random instance of Maximal Common Subsequence ($MCS$) of multiple strings. A common subsequence is {em maximal} if inserting any character into the subsequence no longer yields a common subsequence. A special case of MCS is LCS where the length is the longest. We show the complexity of our algorithm is linear in $L$, and therefore is suitable for large $L$. Furthermore, we study the occurrence probability for a single instance of MCS and demonstrate via both theoretical and experimental studies that the longest subsequence from multiple runs of {em Random-MCS} often yields a solution to $LCS$.

rate research

Randomized algorithms for finding a majority element

138 - Pawe{l} Gawrychowski , Jukka Suomela , Przemys{l}aw Uznanski 2016

Given $n$ colored balls, we want to detect if more than $lfloor n/2rfloor$ of them have the same color, and if so find one ball with such majority color. We are only allowed to choose two balls and compare their colors, and the goal is to minimize the total number of such operations. A well-known exercise is to show how to find such a ball with only $2n$ comparisons while using only a logarithmic number of bits for bookkeeping. The resulting algorithm is called the Boyer--Moore majority vote algorithm. It is known that any deterministic method needs $lceil 3n/2rceil-2$ comparisons in the worst case, and this is tight. However, it is not clear what is the required number of comparisons if we allow randomization. We construct a randomized algorithm which always correctly finds a ball of the majority color (or detects that there is none) using, with high probability, only $7n/6+o(n)$ comparisons. We also prove that the expected number of comparisons used by any such randomized method is at least $1.019n$.

Data Structures and Algorithms

A Randomized Rounding Algorithm for Sparse PCA

83 - Kimon Fountoulakis , Abhisek Kundu , Eugenia-Maria Kontopoulou andn Petros Drineas 2015

We present and analyze a simple, two-step algorithm to approximate the optimal solution of the sparse PCA problem. Our approach first solves a L1 penalized version of the NP-hard sparse PCA optimization problem and then uses a randomized rounding strategy to sparsify the resulting dense solution. Our main theoretical result guarantees an additive error approximation and provides a tradeoff between sparsity and accuracy. Our experimental evaluation indicates that our approach is competitive in practice, even compared to state-of-the-art toolboxes such as Spasm.

Data Structures and Algorithms Machine Learning Machine Learning

A Weighted Common Subgraph Matching Algorithm

403 - Xu Yang , Hong Qiao , 2014

We propose a weighted common subgraph (WCS) matching algorithm to find the most similar subgraphs in two labeled weighted graphs. WCS matching, as a natural generalization of the equal-sized graph matching or subgraph matching, finds wide applications in many computer vision and machine learning tasks. In this paper, the WCS matching is first formulated as a combinatorial optimization problem over the set of partial permutation matrices. Then it is approximately solved by a recently proposed combinatorial optimization framework - Graduated NonConvexity and Concavity Procedure (GNCCP). Experimental comparisons on both synthetic graphs and real world images validate its robustness against noise level, problem size, outlier number, and edge density.

Data Structures and Algorithms Computer Vision and Pattern Recognition

A Linear-Time Algorithm for the Common Refinement of Rooted Phylogenetic Trees on a Common Leaf Set

92 - David Schaller , Marc Hellmuth , Peter F. Stadler 2021

The problem of finding a common refinement of a set of rooted trees with common leaf set $L$ appears naturally in mathematical phylogenetics whenever poorly resolved information on the same taxa from different sources is to be reconciled. This constitutes a special case of the well-studied supertree problem, where the leaf sets of the input trees may differ. Algorithms that solve the rooted tree compatibility problem are of course applicable to this special case. However, they require sophisticated auxiliary data structures and have a running time of at least $O(k|L|log^2(k|L|))$ for $k$ input trees. Here, we show that the problem can be solved in $O(k|L|)$ time using a simple bottom-up algorithm called LinCR. An implementation of LinCR in Python is freely available at https://github.com/david-schaller/tralda.

Data Structures and Algorithms Computational Complexity Combinatorics

A Linear Time Algorithm for Finding Minimum Spanning Tree Replacement Edges

61 - David A. Bader , Paul Burkhardt 2019

Given an undirected, weighted graph, the minimum spanning tree (MST) is a tree that connects all of the vertices of the graph with minimum sum of edge weights. In real world applications, network designers often seek to quickly find a replacement edge for each edge in the MST. For example, when a traffic accident closes a road in a transportation network, or a line goes down in a communication network, the replacement edge may reconnect the MST at lowest cost. In the paper, we consider the case of finding the lowest cost replacement edge for each edge of the MST. A previous algorithm by Tarjan takes $O(m alpha(m, n))$ time, where $alpha(m, n)$ is the inverse Ackermanns function. Given the MST and sorted non-tree edges, our algorithm is the first that runs in $O(m+n)$ time and $O(m+n)$ space to find all replacement edges. Moreover, it is easy to implement and our experimental study demonstrates fast performance on several types of graphs. Additionally, since the most vital edge is the tree edge whose removal causes the highest cost, our algorithm finds it in linear time.

Data Structures and Algorithms

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A Fast Randomized Algorithm for Finding the Maximal Common Subsequences

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions