ترغب بنشر مسار تعليمي؟ اضغط هنا

Internal Shortest Absent Word Queries in Constant Time and Linear Space

135   0   0.0 ( 0 )
 نشر من قبل Dmitry Kosolobov
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Given a string $T$ of length $n$ over an alphabet $Sigmasubset {1,2,ldots,n^{O(1)}}$ of size $sigma$, we are to preprocess $T$ so that given a range $[i,j]$, we can return a representation of a shortest string over $Sigma$ that is absent in the fragment $T[i]cdots T[j]$ of $T$. We present an $O(n)$-space data structure that answers such queries in constant time and can be constructed in $O(nlog_sigma n)$ time.



قيم البحث

اقرأ أيضاً

The determination of time-dependent collision-free shortest paths has received a fair amount of attention. Here, we study the problem of computing a time-dependent shortest path among growing discs which has been previously studied for the instance w here the departure times are fixed. We address a more general setting: For two given points $s$ and $d$, we wish to determine the function $mathcal{A}(t)$ which is the minimum arrival time at $d$ for any departure time $t$ at $s$. We present a $(1+epsilon)$-approximation algorithm for computing $mathcal{A}(t)$. As part of preprocessing, we execute $O({1 over epsilon} log({mathcal{V}_{r} over mathcal{V}_{c}}))$ shortest path computations for fixed departure times, where $mathcal{V}_{r}$ is the maximum speed of the robot and $mathcal{V}_{c}$ is the minimum growth rate of the discs. For any query departure time $t geq 0$ from $s$, we can approximate the minimum arrival time at the destination in $O(log ({1 over epsilon}) + loglog({mathcal{V}_{r} over mathcal{V}_{c}}))$ time, within a factor of $1+epsilon$ of optimal. Since we treat the shortest path computations as black-box functions, for different settings of growing discs, we can plug-in different shortest path algorithms. Thus, the exact time complexity of our algorithm is determined by the running time of the shortest path computations.
77 - Julia Chuzhoy 2021
We study the decremental All-Pairs Shortest Paths (APSP) problem in undirected edge-weighted graphs. The input to the problem is an $n$-vertex $m$-edge graph $G$ with non-negative edge lengths, that undergoes a sequence of edge deletions. The goal is to support approximate shortest-path queries: given a pair $x,y$ of vertices of $G$, return a path $P$ connecting $x$ to $y$, whose length is within factor $alpha$ of the length of the shortest $x$-$y$ path, in time $tilde O(|E(P)|)$, where $alpha$ is the approximation factor of the algorithm. APSP is one of the most basic and extensively studied dynamic graph problems. A long line of work culminated in the algorithm of [Chechik, FOCS 2018] with near optimal guarantees for the oblivious-adversary setting. Unfortunately, adaptive-adversary setting is still poorly understood. For unweighted graphs, the algorithm of [Henzinger, Krinninger and Nanongkai, FOCS 13, SICOMP 16] achieves a $(1+epsilon)$-approximation with total update time $tilde O(mn/epsilon)$; the best current total update time of $n^{2.5+O(epsilon)}$ is achieved by the deterministic algorithm of [Chuzhoy, Saranurak, SODA21], with $2^{O(1/epsilon)}$-multiplicative and $2^{O(log^{3/4}n/epsilon)}$-additive approximation. To the best of our knowledge, for arbitrary non-negative edge weights, the fastest current adaptive-update algorithm has total update time $O(n^{3}log L/epsilon)$, achieving a $(1+epsilon)$-approximation. Here, L is the ratio of longest to shortest edge lengths. Our main result is a deterministic algorithm for decremental APSP in undirected edge-weighted graphs, that, for any $Omega(1/loglog m)leq epsilon< 1$, achieves approximation factor $(log m)^{2^{O(1/epsilon)}}$, with total update time $Oleft (m^{1+O(epsilon)}cdot (log m)^{O(1/epsilon^2)}cdot log Lright )$.
We consider several types of internal queries: questions about subwords of a text. As the main tool we develop an optimal data structure for the problem called here internal pattern matching. This data structure provides constant-time answers to quer ies about occurrences of one subword $x$ in another subword $y$ of a given text, assuming that $|y|=mathcal{O}(|x|)$, which allows for a constant-space representation of all occurrences. This problem can be viewed as a natural extension of the well-studied pattern matching problem. The data structure has linear size and admits a linear-time construction algorithm. Using the solution to the internal pattern matching problem, we obtain very efficient data structures answering queries about: primitivity of subwords, periods of subwords, general substring compression, and cyclic equivalence of two subwords. All these results improve upon the best previously known counterparts. The linear construction time of our data structure also allows to improve the algorithm for finding $delta$-subrepetitions in a text (a more general version of maximal repetitions, also called runs). For any fixed $delta$ we obtain the first linear-time algorithm, which matches the linear time complexity of the algorithm computing runs. Our data structure has already been used as a part of the efficient solutions for subword suffix rank & selection, as well as substring compression using Burrows-Wheeler transform composed with run-length encoding.
We show that the edit distance between two strings of length $n$ can be computed within a factor of $f(epsilon)$ in $n^{1+epsilon}$ time as long as the edit distance is at least $n^{1-delta}$ for some $delta(epsilon) > 0$.
The field of succinct data structures has flourished over the last 16 years. Starting from the compressed suffix array (CSA) by Grossi and Vitter (STOC 2000) and the FM-index by Ferragina and Manzini (FOCS 2000), a number of generalizations and appli cations of string indexes based on the Burrows-Wheeler transform (BWT) have been developed, all taking an amount of space that is close to the input size in bits. In many large-scale applications, the construction of the index and its usage need to be considered as one unit of computation. Efficient string indexing and analysis in small space lies also at the core of a number of primitives in the data-intensive field of high-throughput DNA sequencing. We report the following advances in string indexing and analysis. We show that the BWT of a string $Tin {1,ldots,sigma}^n$ can be built in deterministic $O(n)$ time using just $O(nlog{sigma})$ bits of space, where $sigma leq n$. Within the same time and space budget, we can build an index based on the BWT that allows one to enumerate all the internal nodes of the suffix tree of $T$. Many fundamental string analysis problems can be mapped to such enumeration, and can thus be solved in deterministic $O(n)$ time and in $O(nlog{sigma})$ bits of space from the input string. We also show how to build many of the existing indexes based on the BWT, such as the CSA, the compressed suffix tree (CST), and the bidirectional BWT index, in randomized $O(n)$ time and in $O(nlog{sigma})$ bits of space. The previously fastest construction algorithms for BWT, CSA and CST, which used $O(nlog{sigma})$ bits of space, took $O(nlog{log{sigma}})$ time for the first two structures, and $O(nlog^{epsilon}n)$ time for the third, where $epsilon$ is any positive constant. Contrary to the state of the art, our bidirectional BWT index supports every operation in constant time per element in its output.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا