No Arabic abstract
In this paper we study the fundamental problem of maintaining a dynamic collection of strings under the following operations: concat - concatenates two strings, split - splits a string into two at a given position, compare - finds the lexicographical order (less, equal, greater) between two strings, LCP - calculates the longest common prefix of two strings. We present an efficient data structure for this problem, where an update requires only $O(log n)$ worst-case time with high probability, with $n$ being the total length of all strings in the collection, and a query takes constant worst-case time. On the lower bound side, we prove that even if the only possible query is checking equality of two strings, either updates or queries take amortized $Omega(log n)$ time; hence our implementation is optimal. Such operations can be used as a basic building block to solve other string problems. We provide two examples. First, we can augment our data structure to provide pattern matching queries that may locate occurrences of a specified pattern $p$ in the strings in our collection in optimal $O(|p|)$ time, at the expense of increasing update time to $O(log^2 n)$. Second, we show how to maintain a history of an edited text, processing updates in $O(log t log log t)$ time, where $t$ is the number of edits, and how to support pattern matching queries against the whole history in $O(|p| log t log log t)$ time. Finally, we note that our data structure can be applied to test dynamic tree isomorphism and to compare strings generated by dynamic straight-line grammars.
We consider a range of simply stated dynamic data structure problems on strings. An update changes one symbol in the input and a query asks us to compute some function of the pattern of length $m$ and a substring of a longer text. We give both conditional and unconditional lower bounds for variants of exact matching with wildcards, inner product, and Hamming distance computation via a sequence of reductions. As an example, we show that there does not exist an $O(m^{1/2-varepsilon})$ time algorithm for a large range of these problems unless the online Boolean matrix-vector multiplication conjecture is false. We also provide nearly matching upper bounds for most of the problems we consider.
A Dynamic Graph Network is a graph in which each edge has an associated travel time and a capacity (width) that limits the number of items that can travel in parallel along that edge. Each vertex in this dynamic graph network begins with the number of items that must be evacuated into designated sink vertices. A $k$-sink evacuation protocol finds the location of $k$ sinks and associated evacuation movement protocol that allows evacuating all the items to a sink in minimum time. The associated evacuation movement must impose a confluent flow, i.e, all items passing through a particular vertex exit that vertex using the same edge. In this paper we address the $k$-sink evacuation problem on a dynamic path network. We provide solutions that run in $O(n log n)$ time for $k=1$ and $O(k n log^2 n)$ for $k >1$ and work for arbitrary edge capacities.
The Karp-Rabin fingerprint of a string is a type of hash value that due to its strong properties has been used in many string algorithms. In this paper we show how to construct a data structure for a string $S$ of size $N$ compressed by a context-free grammar of size $n$ that answers fingerprint queries. That is, given indices $i$ and $j$, the answer to a query is the fingerprint of the substring $S[i,j]$. We present the first O(n) space data structures that answer fingerprint queries without decompressing any characters. For Straight Line Programs (SLP) we get $O(log N)$ query time, and for Linear SLPs (an SLP derivative that captures LZ78 compression and its variations) we get $O(log log N)$ query time. Hence, our data structures has the same time and space complexity as for random access in SLPs. We utilize the fingerprint data structures to solve the longest common extension problem in query time $O(log N log lce)$ and $O(log lce loglog lce + loglog N)$ for SLPs and Linear SLPs, respectively. Here, $lce$ denotes the length of the LCE.
In this paper, we consider the problem of identifying patterns of interest in colored strings. A colored string is a string where each position is assigned one of a finite set of colors. Our task is to find substrings of the colored string that always occur followed by the same color at the same distance. The problem is motivated by applications in embedded systems verification, in particular, assertion mining. The goal there is to automatically find properties of the embedded system from the analysis of its simulation traces. We show that, in our setting, the number of patterns of interest is upper-bounded by $mathcal{O}(n^2)$, where $n$ is the length of the string. We introduce a baseline algorithm, running in $mathcal{O}(n^2)$ time, which identifies all patterns of interest satisfying certain minimality conditions, for all colors in the string. For the case where one is interested in patterns related to one color only, we also provide a second algorithm which runs in $mathcal{O}(n^2log n)$ time in the worst case but is faster than the baseline algorithm in practice. Both solutions use suffix trees, and the second algorithm also uses an appropriately defined priority queue, which allows us to reduce the number of computations. We performed an experimental evaluation of the proposed approaches over both synthetic and real-world datasets, and found that the second algorithm outperforms the first algorithm on all simulated data, while on the real-world data, the performance varies between a slight slowdown (on half of the datasets) and a speedup by a factor of up to 11.
In this thesis, we present new techniques to deal with fundamental algorithmic graph problems where graphs are directed and partially dynamic, i.e. undergo either a sequence of edge insertions or deletions: - Single-Source Reachability (SSR), - Strongly-Connected Components (SCCs), and - Single-Source Shortest Paths (SSSP). These problems have recently received an extraordinary amount of attention due to their role as subproblems in various more complex and notoriously hard graph problems, especially to compute flows, bipartite matchings and cuts. Our techniques lead to the first near-optimal data structures for these problems in various different settings. Letting $n$ denote the number of vertices in the graph and by $m$ the maximum number of edges in any version of the graph, we obtain - the first randomized data structure to maintain SSR and SCCs in near-optimal total update time $tilde{O}(m)$ in a graph undergoing edge deletions. - the first randomized data structure to maintain SSSP in partially dynamic graphs in total update time $tilde{O}(n^2)$ which is near-optimal in dense graphs. - the first deterministic data structures for SSR and SCC for graphs undergoing edge deletions, and for SSSP in partially dynamic graphs that improve upon the $O(mn)$ total update time by Even and Shiloach from 1981 that is often considered to be a fundamental barrier.