A Fast Randomized Algorithm for Finding the Maximal Common Subsequences

176 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jin Cao

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jin Cao - Dewei Zhong

بنى وهياكل البيانات والخوارزميات الذكاء الاصطناعي التعقيد الحسابي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Finding the common subsequences of $L$ multiple strings has many applications in the area of bioinformatics, computational linguistics, and information retrieval. A well-known result states that finding a Longest Common Subsequence (LCS) for $L$ strings is NP-hard, e.g., the computational complexity is exponential in $L$. In this paper, we develop a randomized algorithm, referred to as {em Random-MCS}, for finding a random instance of Maximal Common Subsequence ($MCS$) of multiple strings. A common subsequence is {em maximal} if inserting any character into the subsequence no longer yields a common subsequence. A special case of MCS is LCS where the length is the longest. We show the complexity of our algorithm is linear in $L$, and therefore is suitable for large $L$. Furthermore, we study the occurrence probability for a single instance of MCS and demonstrate via both theoretical and experimental studies that the longest subsequence from multiple runs of {em Random-MCS} often yields a solution to $LCS$.

قيم البحث

138 - Pawe{l} Gawrychowski , Jukka Suomela , Przemys{l}aw Uznanski 2016

Given $n$ colored balls, we want to detect if more than $lfloor n/2rfloor$ of them have the same color, and if so find one ball with such majority color. We are only allowed to choose two balls and compare their colors, and the goal is to minimize th e total number of such operations. A well-known exercise is to show how to find such a ball with only $2n$ comparisons while using only a logarithmic number of bits for bookkeeping. The resulting algorithm is called the Boyer--Moore majority vote algorithm. It is known that any deterministic method needs $lceil 3n/2rceil-2$ comparisons in the worst case, and this is tight. However, it is not clear what is the required number of comparisons if we allow randomization. We construct a randomized algorithm which always correctly finds a ball of the majority color (or detects that there is none) using, with high probability, only $7n/6+o(n)$ comparisons. We also prove that the expected number of comparisons used by any such randomized method is at least $1.019n$.

بنى وهياكل البيانات والخوارزميات

A Randomized Rounding Algorithm for Sparse PCA

83 - Kimon Fountoulakis , Abhisek Kundu , Eugenia-Maria Kontopoulou andn Petros Drineas 2015

We present and analyze a simple, two-step algorithm to approximate the optimal solution of the sparse PCA problem. Our approach first solves a L1 penalized version of the NP-hard sparse PCA optimization problem and then uses a randomized rounding str ategy to sparsify the resulting dense solution. Our main theoretical result guarantees an additive error approximation and provides a tradeoff between sparsity and accuracy. Our experimental evaluation indicates that our approach is competitive in practice, even compared to state-of-the-art toolboxes such as Spasm.

بنى وهياكل البيانات والخوارزميات التعلم الآلي التعلم الالي

A Weighted Common Subgraph Matching Algorithm

120 - Xu Yang , Hong Qiao , 2014

We propose a weighted common subgraph (WCS) matching algorithm to find the most similar subgraphs in two labeled weighted graphs. WCS matching, as a natural generalization of the equal-sized graph matching or subgraph matching, finds wide application s in many computer vision and machine learning tasks. In this paper, the WCS matching is first formulated as a combinatorial optimization problem over the set of partial permutation matrices. Then it is approximately solved by a recently proposed combinatorial optimization framework - Graduated NonConvexity and Concavity Procedure (GNCCP). Experimental comparisons on both synthetic graphs and real world images validate its robustness against noise level, problem size, outlier number, and edge density.

بنى وهياكل البيانات والخوارزميات الرؤية الحاسوبية وتمييز الأنماط

A Linear-Time Algorithm for the Common Refinement of Rooted Phylogenetic Trees on a Common Leaf Set

92 - David Schaller , Marc Hellmuth , Peter F. Stadler 2021

The problem of finding a common refinement of a set of rooted trees with common leaf set $L$ appears naturally in mathematical phylogenetics whenever poorly resolved information on the same taxa from different sources is to be reconciled. This consti tutes a special case of the well-studied supertree problem, where the leaf sets of the input trees may differ. Algorithms that solve the rooted tree compatibility problem are of course applicable to this special case. However, they require sophisticated auxiliary data structures and have a running time of at least $O(k|L|log^2(k|L|))$ for $k$ input trees. Here, we show that the problem can be solved in $O(k|L|)$ time using a simple bottom-up algorithm called LinCR. An implementation of LinCR in Python is freely available at https://github.com/david-schaller/tralda.

بنى وهياكل البيانات والخوارزميات التعقيد الحسابي التوافقية

A Linear Time Algorithm for Finding Minimum Spanning Tree Replacement Edges

61 - David A. Bader , Paul Burkhardt 2019

Given an undirected, weighted graph, the minimum spanning tree (MST) is a tree that connects all of the vertices of the graph with minimum sum of edge weights. In real world applications, network designers often seek to quickly find a replacement edg e for each edge in the MST. For example, when a traffic accident closes a road in a transportation network, or a line goes down in a communication network, the replacement edge may reconnect the MST at lowest cost. In the paper, we consider the case of finding the lowest cost replacement edge for each edge of the MST. A previous algorithm by Tarjan takes $O(m alpha(m, n))$ time, where $alpha(m, n)$ is the inverse Ackermanns function. Given the MST and sorted non-tree edges, our algorithm is the first that runs in $O(m+n)$ time and $O(m+n)$ space to find all replacement edges. Moreover, it is easy to implement and our experimental study demonstrates fast performance on several types of graphs. Additionally, since the most vital edge is the tree edge whose removal causes the highest cost, our algorithm finds it in linear time.

بنى وهياكل البيانات والخوارزميات