An $O(n^2)$ Algorithm for Computing Longest Common Cyclic Subsequence

723 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Shihabur Rahman Chowdhury

تاريخ النشر 2009

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Shihabur Rahman Chowdhury - Masud Hasan - Sumaiya Iqbal

بنى وهياكل البيانات والخوارزميات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The {em longest common subsequence (LCS)} problem is a classic and well-studied problem in computer science. LCS is a central problem in stringology and finds broad applications in text compression, error-detecting codes and biological sequence comparison. However, in numerous contexts, words represent cyclic sequences of symbols and LCS must be generalized to consider all circular shifts of the strings. This occurs especially in computational biology when genetic material is sequenced form circular DNA or RNA molecules. This initiates the problem of {em longest common cyclic subsequence (LCCS)} which finds the longest subsequence between all circular shifts of two strings. In this paper, we give an $O(n^2)$ algorithm for solving LCCS problem where $n$ is the number of symbols in the strings.

قيم البحث

106 - Karl Bringmann , Vincent Cohen-Addad , 2021

We consider the classic problem of computing the Longest Common Subsequence (LCS) of two strings of length $n$. While a simple quadratic algorithm has been known for the problem for more than 40 years, no faster algorithm has been found despite an ex tensive effort. The lack of progress on the problem has recently been explained by Abboud, Backurs, and Vassilevska Williams [FOCS15] and Bringmann and Kunnemann [FOCS15] who proved that there is no subquadratic algorithm unless the Strong Exponential Time Hypothesis fails. This has led the community to look for subquadratic approximation algorithms for the problem. Yet, unlike the edit distance problem for which a constant-factor approximation in almost-linear time is known, very little progress has been made on LCS, making it a notoriously difficult problem also in the realm of approximation. For the general setting, only a naive $O(n^{varepsilon/2})$-approximation algorithm with running time $tilde{O}(n^{2-varepsilon})$ has been known, for any constant $0 < varepsilon le 1$. Recently, a breakthrough result by Hajiaghayi, Seddighin, Seddighin, and Sun [SODA19] provided a linear-time algorithm that yields a $O(n^{0.497956})$-approximation in expectation; improving upon the naive $O(sqrt{n})$-approximation for the first time. In this paper, we provide an algorithm that in time $O(n^{2-varepsilon})$ computes an $tilde{O}(n^{2varepsilon/5})$-approximation with high probability, for any $0 < varepsilon le 1$. Our result (1) gives an $tilde{O}(n^{0.4})$-approximation in linear time, improving upon the bound of Hajiaghayi, Seddighin, Seddighin, and Sun, (2) provides an algorithm whose approximation scales with any subquadratic running time $O(n^{2-varepsilon})$, improving upon the naive bound of $O(n^{varepsilon/2})$ for any $varepsilon$, and (3) instead of only in expectation, succeeds with high probability.

بنى وهياكل البيانات والخوارزميات

Variants of Constrained Longest Common Subsequence

540 - Paola Bonizzoni , Gianluca Della Vedova , Riccardo Dondi 2009

In this work, we consider a variant of the classical Longest Common Subsequence problem called Doubly-Constrained Longest Common Subsequence (DC-LCS). Given two strings s1 and s2 over an alphabet A, a set C_s of strings, and a function Co from A to N , the DC-LCS problem consists in finding the longest subsequence s of s1 and s2 such that s is a supersequence of all the strings in Cs and such that the number of occurrences in s of each symbol a in A is upper bounded by Co(a). The DC-LCS problem provides a clear mathematical formulation of a sequence comparison problem in Computational Biology and generalizes two other constrained variants of the LCS problem: the Constrained LCS and the Repetition-Free LCS. We present two results for the DC-LCS problem. First, we illustrate a fixed-parameter algorithm where the parameter is the length of the solution. Secondly, we prove a parameterized hardness result for the Constrained LCS problem when the parameter is the number of the constraint strings and the size of the alphabet A. This hardness result also implies the parameterized hardness of the DC-LCS problem (with the same parameters) and its NP-hardness when the size of the alphabet is constant.

بنى وهياكل البيانات والخوارزميات الرياضيات المتقطعة

Heuristic algorithms for the Longest Filled Common Subsequence Problem

121 - Radu Stefan Mincu University ofn Bucharest 2019

At CPM 2017, Castelli et al. define and study a new variant of the Longest Common Subsequence Problem, termed the Longest Filled Common Subsequence Problem (LFCS). For the LFCS problem, the input consists of two strings $A$ and $B$ and a multiset of characters $mathcal{M}$. The goal is to insert the characters from $mathcal{M}$ into the string $B$, thus obtaining a new string $B^*$, such that the Longest Common Subsequence (LCS) between $A$ and $B^*$ is maximized. Casteli et al. show that the problem is NP-hard and provide a 3/5-approximation algorithm for the problem. In this paper we study the problem from the experimental point of view. We introduce, implement and test new heuristic algorithms and compare them with the approximation algorithm of Casteli et al. Moreover, we introduce an Integer Linear Program (ILP) model for the problem and we use the state of the art ILP solver, Gurobi, to obtain exact solution for moderate sized instances.

بنى وهياكل البيانات والخوارزميات

Lower Bounds and Improved Algorithms for Asymmetric Streaming Edit Distance and Longest Common Subsequence

154 - Xin Li , Yu Zheng 2021

In this paper, we study edit distance (ED) and longest common subsequence (LCS) in the asymmetric streaming model, introduced by Saks and Seshadhri [SS13]. As an intermediate model between the random access model and the streaming model, this model a llows one to have streaming access to one string and random access to the other string. Our first main contribution is a systematic study of space lower bounds for ED and LCS in the asymmetric streaming model. Previously, there are no explicitly stated results in this context, although some lower bounds about LCS can be inferred from the lower bounds for longest increasing subsequence (LIS) in [SW07][GG10][EJ08]. Yet these bounds only work for large alphabet size. In this paper, we develop several new techniques to handle ED in general and LCS for small alphabet size, thus establishing strong lower bounds for both problems. In particular, our lower bound for ED provides an exponential separation between edit distance and Hamming distance in the asymmetric streaming model. Our lower bounds also extend to LIS and longest non-decreasing sequence (LNS) in the standard streaming model. Together with previous results, our bounds provide an almost complete picture for these two problems. As our second main contribution, we give improved algorithms for ED and LCS in the asymmetric streaming model. For ED, we improve the space complexity of the constant factor approximation algorithms in [FHRS20][CJLZ20] from $tilde{O}(frac{n^delta}{delta})$ to $O(frac{d^delta}{delta};mathsf{polylog}(n))$, where $n$ is the length of each string and $d$ is the edit distance between the two strings. For LCS, we give the first $1/2+epsilon$ approximation algorithm with space $n^{delta}$ for any constant $delta>0$, over a binary alphabet.

بنى وهياكل البيانات والخوارزميات

An $O(n log n)$ time Algorithm for computing the Path-length Distance between Trees

119 - David Bryant , Celine Scornavacca 2018

Tree comparison metrics have proven to be an invaluable aide in the reconstruction and analysis of phylogenetic (evolutionary) trees. The path-length distance between trees is a particularly attractive measure as it reflects differences in tree shape as well as differences between branch lengths. The distance equals the sum, over all pairs of taxa, of the squared differences between the lengths of the unique path connecting them in each tree. We describe an $O(n log n)$ time for computing this distance, making extensive use of tree decomposition techniques introduced by Brodal et al. (2004).

بنى وهياكل البيانات والخوارزميات

سجل دخول لتتمكن من نشر تعليقات