أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Philip Bille

86 - Philip Bille , Patrick Hagge Cording , Inge Li G{o}rtz 2013

The Karp-Rabin fingerprint of a string is a type of hash value that due to its strong properties has been used in many string algorithms. In this paper we show how to construct a data structure for a string $S$ of size $N$ compressed by a context-fre e grammar of size $n$ that answers fingerprint queries. That is, given indices $i$ and $j$, the answer to a query is the fingerprint of the substring $S[i,j]$. We present the first O(n) space data structures that answer fingerprint queries without decompressing any characters. For Straight Line Programs (SLP) we get $O(log N)$ query time, and for Linear SLPs (an SLP derivative that captures LZ78 compression and its variations) we get $O(log log N)$ query time. Hence, our data structures has the same time and space complexity as for random access in SLPs. We utilize the fingerprint data structures to solve the longest common extension problem in query time $O(log N log lce)$ and $O(log lce loglog lce + loglog N)$ for SLPs and Linear SLPs, respectively. Here, $lce$ denotes the length of the LCE.

بنى وهياكل البيانات والخوارزميات

Time-Space Trade-Offs for Longest Common Extensions

82 - Philip Bille , Inge Li Goertz , Benjamin Sach 2012

We revisit the longest common extension (LCE) problem, that is, preprocess a string $T$ into a compact data structure that supports fast LCE queries. An LCE query takes a pair $(i,j)$ of indices in $T$ and returns the length of the longest common pre fix of the suffixes of $T$ starting at positions $i$ and $j$. We study the time-space trade-offs for the problem, that is, the space used for the data structure vs. the worst-case time for answering an LCE query. Let $n$ be the length of $T$. Given a parameter $tau$, $1 leq tau leq n$, we show how to achieve either $O(infrac{n}{sqrt{tau}})$ space and $O(tau)$ query time, or $O(infrac{n}{tau})$ space and $O(tau log({|LCE(i,j)|}/{tau}))$ query time, where $|LCE(i,j)|$ denotes the length of the LCE returned by the query. These bounds provide the first smooth trade-offs for the LCE problem and almost match the previously known bounds at the extremes when $tau=1$ or $tau=n$. We apply the result to obtain improved bounds for several applications where the LCE problem is the computational bottleneck, including approximate string matching and computing palindromes. We also present an efficient technique to reduce LCE queries on two strings to one string. Finally, we give a lower bound on the time-space product for LCE data structures in the non-uniform cell probe model showing that our second trade-off is nearly optimal.

بنى وهياكل البيانات والخوارزميات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد