No Arabic abstract
In recent years much effort has been concentrated towards achieving polynomial time lower bounds on algorithms for solving various well-known problems. A useful technique for showing such lower bounds is to prove them conditionally based on well-studied hardness assumptions such as 3SUM, APSP, SETH, etc. This line of research helps to obtain a better understanding of the complexity inside P. A related question asks to prove conditional space lower bounds on data structures that are constructed to solve certain algorithmic tasks after an initial preprocessing stage. This question received little attention in previous research even though it has potential strong impact. In this paper we address this question and show that surprisingly many of the well-studied hard problems that are known to have conditional polynomial time lower bounds are also hard when concerning space. This hardness is shown as a tradeoff between the space consumed by the data structure and the time needed to answer queries. The tradeoff may be either smooth or admit one or more singularity points. We reveal interesting connections between different space hardness conjectures and present matching upper bounds. We also apply these hardness conjectures to both static and dynamic problems and prove their conditional space hardness. We believe that this novel framework of polynomial space conjectures can play an important role in expressing polynomial space lower bounds of many important algorithmic problems. Moreover, it seems that it can also help in achieving a better understanding of the hardness of their corresponding problems in terms of time.
We revisit the $k$-mismatch problem in the streaming model on a pattern of length $m$ and a streaming text of length $n$, both over a size-$sigma$ alphabet. The current state-of-the-art algorithm for the streaming $k$-mismatch problem, by Clifford et al. [SODA 2019], uses $tilde O(k)$ space and $tilde Obig(sqrt kbig)$ worst-case time per character. The space complexity is known to be (unconditionally) optimal, and the worst-case time per character matches a conditional lower bound. However, there is a gap between the total time cost of the algorithm, which is $tilde O(nsqrt k)$, and the fastest known offline algorithm, which costs $tilde Obig(n + minbig(frac{nk}{sqrt m},sigma nbig)big)$ time. Moreover, it is not known whether improvements over the $tilde O(nsqrt k)$ total time are possible when using more than $O(k)$ space. We address these gaps by designing a randomized streaming algorithm for the $k$-mismatch problem that, given an integer parameter $kle s le m$, uses $tilde O(s)$ space and costs $tilde Obig(n+minbig(frac {nk^2}m,frac{nk}{sqrt s},frac{sigma nm}sbig)big)$ total time. For $s=m$, the total runtime becomes $tilde Obig(n + minbig(frac{nk}{sqrt m},sigma nbig)big)$, which matches the time cost of the fastest offline algorithm. Moreover, the worst-case time cost per character is still $tilde Obig(sqrt kbig)$.
Given two strings $S$ and $P$, the Episode Matching problem is to compute the length of the shortest substring of $S$ that contains $P$ as a subsequence. The best known upper bound for this problem is $tilde O(nm)$ by Das et al. (1997), where $n,m$ are the lengths of $S$ and $P$, respectively. Although the problem is well studied and has many applications in data mining, this bound has never been improved. In this paper we show why this is the case by proving that an $O((nm)^{1-epsilon})$ algorithm (even for binary strings) would refute the popular Strong Exponential Time Hypothesis (SETH). The proof is based on a simple reduction from Orthogonal Vectors.
We prove an $Omega(d lg n/ (lglg n)^2)$ lower bound on the dynamic cell-probe complexity of statistically $mathit{oblivious}$ approximate-near-neighbor search ($mathsf{ANN}$) over the $d$-dimensional Hamming cube. For the natural setting of $d = Theta(log n)$, our result implies an $tilde{Omega}(lg^2 n)$ lower bound, which is a quadratic improvement over the highest (non-oblivious) cell-probe lower bound for $mathsf{ANN}$. This is the first super-logarithmic $mathit{unconditional}$ lower bound for $mathsf{ANN}$ against general (non black-box) data structures. We also show that any oblivious $mathit{static}$ data structure for decomposable search problems (like $mathsf{ANN}$) can be obliviously dynamized with $O(log n)$ overhead in update and query time, strengthening a classic result of Bentley and Saxe (Algorithmica, 1980).
Caches are a fundamental component of latency-sensitive computer systems. Recent work of [ASWB20] has initiated the study of delayed hits: a phenomenon in caches that occurs when the latency between the cache and backing store is much larger than the time between new requests. We present two results for the delayed hits caching model. (1) Competitive ratio lower bound. We prove that the competitive ratio of the algorithm in [ASWB20], and more generally of any deterministic online algorithm for delayed hits, is at least Omega(kZ), where k is the cache size and Z is the delay parameter. (2) Antimonotonicity of the delayed hits latency. Antimonotonicity is a naturally desirable property of cache latency: having a cache hit instead of a cache miss should result in lower overall latency. We prove that the latency of the delayed hits model is not antimonotone by exhibiting a scenario where having a cache hit instead of a miss results in an increase in overall latency. We additionally present a modification of the delayed hits model that makes the latency antimonotone.
We consider the file maintenance problem (also called the online labeling problem) in which n integer items from the set {1,...,r} are to be stored in an array of size m >= n. The items are presented sequentially in an arbitrary order, and must be stored in the array in sorted order (but not necessarily in consecutive locations in the array). Each new item must be stored in the array before the next item is received. If r<=m then we can simply store item j in location j but if r>m then we may have to shift the location of stored items to make space for a newly arrived item. The algorithm is charged each time an item is stored in the array, or moved to a new location. The goal is to minimize the total number of such moves done by the algorithm. This problem is non-trivial when n=<m<r. In the case that m=Cn for some C>1, algorithms for this problem with cost O(log(n)^2) per item have been given [IKR81, Wil92, BCD+02]. When m=n, algorithms with cost O(log(n)^3) per item were given [Zha93, BS07]. In this paper we prove lower bounds that show that these algorithms are optimal, up to constant factors. Previously, the only lower bound known for this range of parameters was a lower bound of Omega(log(n)^2) for the restricted class of smooth algorithms [DSZ05a, Zha93]. We also provide an algorithm for the sparse case: If the number of items is polylogarithmic in the array size then the problem can be solved in amortized constant time per item.