أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Benjamin Sach

The complexity of computation in bit streams

72 - Raphael Clifford , Markus Jalsenius , Benjamin Sach 2015

We revisit the complexity of online computation in the cell probe model. We consider a class of problems where we are first given a fixed pattern or vector $F$ of $n$ symbols and then one symbol arrives at a time in a stream. After each symbol has ar rived we must output some function of $F$ and the $n$-length suffix of the arriving stream. Cell probe bounds of $Omega(deltalg{n}/w)$ have previously been shown for both convolution and Hamming distance in this setting, where $delta$ is the size of a symbol in bits and $winOmega(lg{n})$ is the cell size in bits. However, when $delta$ is a constant, as it is in many natural situations, these previous results no longer give us non-trivial bounds. We introduce a new lop-sided information transfer proof technique which enables us to prove meaningful lower bounds even for constant size input alphabets. We use our new framework to prove an amortised cell probe lower bound of $Omega(lg^2 n/(wcdot lg lg n))$ time per arriving bit for an online version of a well studied problem known as pattern matching with address errors. This is the first non-trivial cell probe lower bound for any online problem on bit streams that still holds when the cell sizes are large. We also show the same bound for online convolution conditioned on a new combinatorial conjecture related to Toeplitz matrices.

التعقيد الحسابي بنى وهياكل البيانات والخوارزميات

Time Bounds for Streaming Problems

116 - Raphael Clifford , Markus Jalsenius , Benjamin Sach 2014

We give tight cell-probe bounds for the time to compute convolution, multiplication and Hamming distance in a stream. The cell probe model is a particularly strong computational model and subsumes, for example, the popular word RAM model. We first consider online convolution where the task is to output the inner product between a fixed $n$-dimensional vector and a vector of the $n$ most recent values from a stream. One symbol of the stream arrives at a time and the each output must be computed before the next symbols arrives. Next we show bounds for online multiplication where the stream consists of pairs of digits, one from each of two $n$ digit numbers that are to be multiplied. One pair arrives at a time and the task is to output a single new digit from the product before the next pair of digits arrives. Finally we look at the online Hamming distance problem where the Hamming distance is outputted instead of the inner product. For each of these three problems, we give a lower bound of $Omega(frac{delta}{w}log n)$ time on average per output, where $delta$ is the number of bits needed to represent an input symbol and $w$ is the cell or word size. We argue that these bound are in fact tight within the cell probe model.

بنى وهياكل البيانات والخوارزميات التعقيد الحسابي

Cell-Probe Bounds for Online Edit Distance and Other Pattern Matching Problems

96 - Raphael Clifford , Markus Jalsenius , Benjamin Sach 2014

We give cell-probe bounds for the computation of edit distance, Hamming distance, convolution and longest common subsequence in a stream. In this model, a fixed string of $n$ symbols is given and one $delta$-bit symbol arrives at a time in a stream. After each symbol arrives, the distance between the fixed string and a suffix of most recent symbols of the stream is reported. The cell-probe model is perhaps the strongest model of computation for showing data structure lower bounds, subsuming in particular the popular word-RAM model. * We first give an $Omega((delta log n)/(w+loglog n))$ lower bound for the time to give each output for both online Hamming distance and convolution, where $w$ is the word size. This bound relies on a new encoding scheme and for the first time holds even when $w$ is as small as a single bit. * We then consider the online edit distance and longest common subsequence problems in the bit-probe model ($w=1$) with a constant sized input alphabet. We give a lower bound of $Omega(sqrt{log n}/(loglog n)^{3/2})$ which applies for both problems. This second set of results relies both on our new encoding scheme as well as a carefully constructed hard distribution. * Finally, for the online edit distance problem we show that there is an $O((log n)^2/w)$ upper bound in the cell-probe model. This bound gives a contrast to our new lower bound and also establishes an exponential gap between the known cell-probe and RAM model complexities.

بنى وهياكل البيانات والخوارزميات

Time-Space Trade-Offs for Longest Common Extensions

82 - Philip Bille , Inge Li Goertz , Benjamin Sach 2012

We revisit the longest common extension (LCE) problem, that is, preprocess a string $T$ into a compact data structure that supports fast LCE queries. An LCE query takes a pair $(i,j)$ of indices in $T$ and returns the length of the longest common pre fix of the suffixes of $T$ starting at positions $i$ and $j$. We study the time-space trade-offs for the problem, that is, the space used for the data structure vs. the worst-case time for answering an LCE query. Let $n$ be the length of $T$. Given a parameter $tau$, $1 leq tau leq n$, we show how to achieve either $O(infrac{n}{sqrt{tau}})$ space and $O(tau)$ query time, or $O(infrac{n}{tau})$ space and $O(tau log({|LCE(i,j)|}/{tau}))$ query time, where $|LCE(i,j)|$ denotes the length of the LCE returned by the query. These bounds provide the first smooth trade-offs for the LCE problem and almost match the previously known bounds at the extremes when $tau=1$ or $tau=n$. We apply the result to obtain improved bounds for several applications where the LCE problem is the computational bottleneck, including approximate string matching and computing palindromes. We also present an efficient technique to reduce LCE queries on two strings to one string. Finally, we give a lower bound on the time-space product for LCE data structures in the non-uniform cell probe model showing that our second trade-off is nearly optimal.

بنى وهياكل البيانات والخوارزميات

Tight Cell-Probe Bounds for Online Hamming Distance Computation

113 - Raphael Clifford , Markus Jalsenius , Benjamin Sach 2012

We show tight bounds for online Hamming distance computation in the cell-probe model with word size w. The task is to output the Hamming distance between a fixed string of length n and the last n symbols of a stream. We give a lower bound of Omega((d /w)*log n) time on average per output, where d is the number of bits needed to represent an input symbol. We argue that this bound is tight within the model. The lower bound holds under randomisation and amortisation.

بنى وهياكل البيانات والخوارزميات التعقيد الحسابي

Parameterized Matching in the Streaming Model

78 - Markus Jalsenius , Benny Porat , Benjamin Sach 2011

We study the problem of parameterized matching in a stream where we want to output matches between a pattern of length m and the last m symbols of the stream before the next symbol arrives. Parameterized matching is a natural generalisation of exact matching where an arbitrary one-to-one relabelling of pattern symbols is allowed. We show how this problem can be solved in constant time per arriving stream symbol and sublinear, near optimal space with high probability. Our results are surprising and important: it has been shown that almost no streaming pattern matching problems can be solved (not even randomised) in less than Theta(m) space, with exact matching as the only known problem to have a sublinear, near optimal space solution. Here we demonstrate that a similar sublinear, near optimal space solution is achievable for an even more challenging problem. The proof is considerably more complex than that for exact matching.

بنى وهياكل البيانات والخوارزميات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد