A Note on the Longest Common Compatible Prefix Problem for Partial Words

266 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jakub Radoszewski

تاريخ النشر 2013

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Maxime Crochemore - Costas S. Iliopoulos - Tomasz Kociumaka

بنى وهياكل البيانات والخوارزميات اللغات الرسمية ونظرية الأتومات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

For a partial word $w$ the longest common compatible prefix of two positions $i,j$, denoted $lccp(i,j)$, is the largest $k$ such that $w[i,i+k-1]uparrow w[j,j+k-1]$, where $uparrow$ is the compatibility relation of partial words (it is not an equivalence relation). The LCCP problem is to preprocess a partial word in such a way that any query $lccp(i,j)$ about this word can be answered in $O(1)$ time. It is a natural generalization of the longest common prefix (LCP) problem for regular words, for which an $O(n)$ preprocessing time and $O(1)$ query time solution exists. Recently an efficient algorithm for this problem has been given by F. Blanchet-Sadri and J. Lazarow (LATA 2013). The preprocessing time was $O(nh+n)$, where $h$ is the number of holes in $w$. The algorithm was designed for partial words over a constant alphabet and was quite involved. We present a simple solution to this problem with slightly better runtime that works for any linearly-sortable alphabet. Our preprocessing is in time $O(nmu+n)$, where $mu$ is the number of blocks of holes in $w$. Our algorithm uses ideas from alignment algorithms and dynamic programming.

قيم البحث

121 - Radu Stefan Mincu University ofn Bucharest 2019

At CPM 2017, Castelli et al. define and study a new variant of the Longest Common Subsequence Problem, termed the Longest Filled Common Subsequence Problem (LFCS). For the LFCS problem, the input consists of two strings $A$ and $B$ and a multiset of characters $mathcal{M}$. The goal is to insert the characters from $mathcal{M}$ into the string $B$, thus obtaining a new string $B^*$, such that the Longest Common Subsequence (LCS) between $A$ and $B^*$ is maximized. Casteli et al. show that the problem is NP-hard and provide a 3/5-approximation algorithm for the problem. In this paper we study the problem from the experimental point of view. We introduce, implement and test new heuristic algorithms and compare them with the approximation algorithm of Casteli et al. Moreover, we introduce an Integer Linear Program (ILP) model for the problem and we use the state of the art ILP solver, Gurobi, to obtain exact solution for moderate sized instances.

بنى وهياكل البيانات والخوارزميات

On universal partial words

296 - Herman Z.Q. Chen , Sergey Kitaev , Torsten Mutze 2016

A universal word for a finite alphabet $A$ and some integer $ngeq 1$ is a word over $A$ such that every word in $A^n$ appears exactly once as a subword (cyclically or linearly). It is well-known and easy to prove that universal words exist for any $A $ and $n$. In this work we initiate the systematic study of universal partial words. These are words that in addition to the letters from $A$ may contain an arbitrary number of occurrences of a special `joker symbol $Diamond otin A$, which can be substituted by any symbol from $A$. For example, $u=0Diamond 011100$ is a linear partial word for the binary alphabet $A={0,1}$ and for $n=3$ (e.g., the first three letters of $u$ yield the subwords $000$ and $010$). We present results on the existence and non-existence of linear and cyclic universal partial words in different situations (depending on the number of $Diamond$s and their positions), including various explicit constructions. We also provide numerous examples of universal partial words that we found with the help of a computer.

التوافقية اللغات الرسمية ونظرية الأتومات نظرية المعلومات

String Attractors and Combinatorics on Words

73 - Sabrina Mantaci , Antonio Restivo , Giuseppe Romana 2019

The notion of emph{string attractor} has recently been introduced in [Prezza, 2017] and studied in [Kempa and Prezza, 2018] to provide a unifying framework for known dictionary-based compressors. A string attractor for a word $w=w[1]w[2]cdots w[n]$ i s a subset $Gamma$ of the positions ${1,ldots,n}$, such that all distinct factors of $w$ have an occurrence crossing at least one of the elements of $Gamma$. While finding the smallest string attractor for a word is a NP-complete problem, it has been proved in [Kempa and Prezza, 2018] that dictionary compressors can be interpreted as algorithms approximating the smallest string attractor for a given word. In this paper we explore the notion of string attractor from a combinatorial point of view, by focusing on several families of finite words. The results presented in the paper suggest that the notion of string attractor can be used to define new tools to investigate combinatorial properties of the words.

بنى وهياكل البيانات والخوارزميات اللغات الرسمية ونظرية الأتومات

A Linear-Time $n^{0.4}$-Approximation for Longest Common Subsequence

106 - Karl Bringmann , Vincent Cohen-Addad , 2021

We consider the classic problem of computing the Longest Common Subsequence (LCS) of two strings of length $n$. While a simple quadratic algorithm has been known for the problem for more than 40 years, no faster algorithm has been found despite an ex tensive effort. The lack of progress on the problem has recently been explained by Abboud, Backurs, and Vassilevska Williams [FOCS15] and Bringmann and Kunnemann [FOCS15] who proved that there is no subquadratic algorithm unless the Strong Exponential Time Hypothesis fails. This has led the community to look for subquadratic approximation algorithms for the problem. Yet, unlike the edit distance problem for which a constant-factor approximation in almost-linear time is known, very little progress has been made on LCS, making it a notoriously difficult problem also in the realm of approximation. For the general setting, only a naive $O(n^{varepsilon/2})$-approximation algorithm with running time $tilde{O}(n^{2-varepsilon})$ has been known, for any constant $0 < varepsilon le 1$. Recently, a breakthrough result by Hajiaghayi, Seddighin, Seddighin, and Sun [SODA19] provided a linear-time algorithm that yields a $O(n^{0.497956})$-approximation in expectation; improving upon the naive $O(sqrt{n})$-approximation for the first time. In this paper, we provide an algorithm that in time $O(n^{2-varepsilon})$ computes an $tilde{O}(n^{2varepsilon/5})$-approximation with high probability, for any $0 < varepsilon le 1$. Our result (1) gives an $tilde{O}(n^{0.4})$-approximation in linear time, improving upon the bound of Hajiaghayi, Seddighin, Seddighin, and Sun, (2) provides an algorithm whose approximation scales with any subquadratic running time $O(n^{2-varepsilon})$, improving upon the naive bound of $O(n^{varepsilon/2})$ for any $varepsilon$, and (3) instead of only in expectation, succeeds with high probability.

بنى وهياكل البيانات والخوارزميات

Computing the longest common prefix of a context-free language in polynomial time

110 - Michael Luttenberger , Raphaela Palenta , Helmut Seidl 2017

We present two structural results concerning longest common prefixes of non-empty languages. First, we show that the longest common prefix of the language generated by a context-free grammar of size $N$ equals the longest common prefix of the same gr ammar where the heights of the derivation trees are bounded by $4N$. Second, we show that each nonempty language $L$ has a representative subset of at most three elements which behaves like $L$ w.r.t. the longest common prefix as well as w.r.t. longest common prefixes of $L$ after unions or concatenations with arbitrary other languages. From that, we conclude that the longest common prefix, and thus the longest common suffix, of a context-free language can be computed in polynomial time.

اللغات الرسمية ونظرية الأتومات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الملك عبد العزيز

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A Note on the Longest Common Compatible Prefix Problem for Partial Words

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً