Internal Shortest Absent Word Queries in Constant Time and Linear Space

135 0 0.0 ( 0 )

Download Cite

Added by Dmitry Kosolobov

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Golnaz Badkobeh - Panagiotis Charalampopoulos - Dmitry Kosolobov

Data Structures and Algorithms

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Given a string $T$ of length $n$ over an alphabet $Sigmasubset {1,2,ldots,n^{O(1)}}$ of size $sigma$, we are to preprocess $T$ so that given a range $[i,j]$, we can return a representation of a shortest string over $Sigma$ that is absent in the fragment $T[i]cdots T[j]$ of $T$. We present an $O(n)$-space data structure that answers such queries in constant time and can be constructed in $O(nlog_sigma n)$ time.

rate research

Time-Dependent Shortest Path Queries Among Growing Discs

156 - Anil Maheshwari , Arash Nouri , Jorg-Rudiger Sack 2018

The determination of time-dependent collision-free shortest paths has received a fair amount of attention. Here, we study the problem of computing a time-dependent shortest path among growing discs which has been previously studied for the instance where the departure times are fixed. We address a more general setting: For two given points $s$ and $d$, we wish to determine the function $mathcal{A}(t)$ which is the minimum arrival time at $d$ for any departure time $t$ at $s$. We present a $(1+epsilon)$-approximation algorithm for computing $mathcal{A}(t)$. As part of preprocessing, we execute $O({1 over epsilon} log({mathcal{V}_{r} over mathcal{V}_{c}}))$ shortest path computations for fixed departure times, where $mathcal{V}_{r}$ is the maximum speed of the robot and $mathcal{V}_{c}$ is the minimum growth rate of the discs. For any query departure time $t geq 0$ from $s$, we can approximate the minimum arrival time at the destination in $O(log ({1 over epsilon}) + loglog({mathcal{V}_{r} over mathcal{V}_{c}}))$ time, within a factor of $1+epsilon$ of optimal. Since we treat the shortest path computations as black-box functions, for different settings of growing discs, we can plug-in different shortest path algorithms. Thus, the exact time complexity of our algorithm is determined by the running time of the shortest path computations.

Data Structures and Algorithms

Decremental All-Pairs Shortest Paths in Deterministic Near-Linear Time

77 - Julia Chuzhoy 2021

We study the decremental All-Pairs Shortest Paths (APSP) problem in undirected edge-weighted graphs. The input to the problem is an $n$-vertex $m$-edge graph $G$ with non-negative edge lengths, that undergoes a sequence of edge deletions. The goal is to support approximate shortest-path queries: given a pair $x,y$ of vertices of $G$, return a path $P$ connecting $x$ to $y$, whose length is within factor $alpha$ of the length of the shortest $x$-$y$ path, in time $tilde O(|E(P)|)$, where $alpha$ is the approximation factor of the algorithm. APSP is one of the most basic and extensively studied dynamic graph problems. A long line of work culminated in the algorithm of [Chechik, FOCS 2018] with near optimal guarantees for the oblivious-adversary setting. Unfortunately, adaptive-adversary setting is still poorly understood. For unweighted graphs, the algorithm of [Henzinger, Krinninger and Nanongkai, FOCS 13, SICOMP 16] achieves a $(1+epsilon)$-approximation with total update time $tilde O(mn/epsilon)$; the best current total update time of $n^{2.5+O(epsilon)}$ is achieved by the deterministic algorithm of [Chuzhoy, Saranurak, SODA21], with $2^{O(1/epsilon)}$-multiplicative and $2^{O(log^{3/4}n/epsilon)}$-additive approximation. To the best of our knowledge, for arbitrary non-negative edge weights, the fastest current adaptive-update algorithm has total update time $O(n^{3}log L/epsilon)$, achieving a $(1+epsilon)$-approximation. Here, L is the ratio of longest to shortest edge lengths. Our main result is a deterministic algorithm for decremental APSP in undirected edge-weighted graphs, that, for any $Omega(1/loglog m)leq epsilon< 1$, achieves approximation factor $(log m)^{2^{O(1/epsilon)}}$, with total update time $Oleft (m^{1+O(epsilon)}cdot (log m)^{O(1/epsilon^2)}cdot log Lright )$.

Data Structures and Algorithms

Internal Pattern Matching Queries in a Text and Applications

355 - Tomasz Kociumaka , Jakub Radoszewski , Wojciech Rytter 2013

We consider several types of internal queries: questions about subwords of a text. As the main tool we develop an optimal data structure for the problem called here internal pattern matching. This data structure provides constant-time answers to queries about occurrences of one subword $x$ in another subword $y$ of a given text, assuming that $|y|=mathcal{O}(|x|)$, which allows for a constant-space representation of all occurrences. This problem can be viewed as a natural extension of the well-studied pattern matching problem. The data structure has linear size and admits a linear-time construction algorithm. Using the solution to the internal pattern matching problem, we obtain very efficient data structures answering queries about: primitivity of subwords, periods of subwords, general substring compression, and cyclic equivalence of two subwords. All these results improve upon the best previously known counterparts. The linear construction time of our data structure also allows to improve the algorithm for finding $delta$-subrepetitions in a text (a more general version of maximal repetitions, also called runs). For any fixed $delta$ we obtain the first linear-time algorithm, which matches the linear time complexity of the algorithm computing runs. Our data structure has already been used as a part of the efficient solutions for subword suffix rank & selection, as well as substring compression using Burrows-Wheeler transform composed with run-length encoding.

Data Structures and Algorithms

Constant-factor approximation of near-linear edit distance in near-linear time

126 - Joshua Brakensiek , Aviad Rubinstein 2019

We show that the edit distance between two strings of length $n$ can be computed within a factor of $f(epsilon)$ in $n^{1+epsilon}$ time as long as the edit distance is at least $n^{1-delta}$ for some $delta(epsilon) > 0$.

Data Structures and Algorithms

Linear-time string indexing and analysis in small space

56 - Djamal Belazzougui , Fabio Cunial , Juha Karkkainen 2016

The field of succinct data structures has flourished over the last 16 years. Starting from the compressed suffix array (CSA) by Grossi and Vitter (STOC 2000) and the FM-index by Ferragina and Manzini (FOCS 2000), a number of generalizations and applications of string indexes based on the Burrows-Wheeler transform (BWT) have been developed, all taking an amount of space that is close to the input size in bits. In many large-scale applications, the construction of the index and its usage need to be considered as one unit of computation. Efficient string indexing and analysis in small space lies also at the core of a number of primitives in the data-intensive field of high-throughput DNA sequencing. We report the following advances in string indexing and analysis. We show that the BWT of a string $Tin {1,ldots,sigma}^n$ can be built in deterministic $O(n)$ time using just $O(nlog{sigma})$ bits of space, where $sigma leq n$. Within the same time and space budget, we can build an index based on the BWT that allows one to enumerate all the internal nodes of the suffix tree of $T$. Many fundamental string analysis problems can be mapped to such enumeration, and can thus be solved in deterministic $O(n)$ time and in $O(nlog{sigma})$ bits of space from the input string. We also show how to build many of the existing indexes based on the BWT, such as the CSA, the compressed suffix tree (CST), and the bidirectional BWT index, in randomized $O(n)$ time and in $O(nlog{sigma})$ bits of space. The previously fastest construction algorithms for BWT, CSA and CST, which used $O(nlog{sigma})$ bits of space, took $O(nlog{log{sigma}})$ time for the first two structures, and $O(nlog^{epsilon}n)$ time for the third, where $epsilon$ is any positive constant. Contrary to the state of the art, our bidirectional BWT index supports every operation in constant time per element in its output.

Data Structures and Algorithms