No Arabic abstract
Caches are a fundamental component of latency-sensitive computer systems. Recent work of [ASWB20] has initiated the study of delayed hits: a phenomenon in caches that occurs when the latency between the cache and backing store is much larger than the time between new requests. We present two results for the delayed hits caching model. (1) Competitive ratio lower bound. We prove that the competitive ratio of the algorithm in [ASWB20], and more generally of any deterministic online algorithm for delayed hits, is at least Omega(kZ), where k is the cache size and Z is the delay parameter. (2) Antimonotonicity of the delayed hits latency. Antimonotonicity is a naturally desirable property of cache latency: having a cache hit instead of a cache miss should result in lower overall latency. We prove that the latency of the delayed hits model is not antimonotone by exhibiting a scenario where having a cache hit instead of a miss results in an increase in overall latency. We additionally present a modification of the delayed hits model that makes the latency antimonotone.
In the model of online caching with machine learned advice, introduced by Lykouris and Vassilvitskii, the goal is to solve the caching problem with an online algorithm that has access to next-arrival predictions: when each input element arrives, the algorithm is given a prediction of the next time when the element will reappear. The traditional model for online caching suffers from an $Omega(log k)$ competitive ratio lower bound (on a cache of size $k$). In contrast, the augmented model admits algorithms which beat this lower bound when the predictions have low error, and asymptotically match the lower bound when the predictions have high error, even if the algorithms are oblivious to the prediction error. In particular, Lykouris and Vassilvitskii showed that there is a prediction-augmented caching algorithm with a competitive ratio of $O(1+min(sqrt{eta/OPT}, log k))$ when the overall $ell_1$ prediction error is bounded by $eta$, and $OPT$ is the cost of the optimal offline algorithm. The dependence on $k$ in the competitive ratio is optimal, but the dependence on $eta/OPT$ may be far from optimal. In this work, we make progress towards closing this gap. Our contributions are twofold. First, we provide an improved algorithm with a competitive ratio of $O(1 + min((eta/OPT)/k, 1) log k)$. Second, we provide a lower bound of $Omega(log min((eta/OPT)/(k log k), k))$.
We prove an $Omega(d lg n/ (lglg n)^2)$ lower bound on the dynamic cell-probe complexity of statistically $mathit{oblivious}$ approximate-near-neighbor search ($mathsf{ANN}$) over the $d$-dimensional Hamming cube. For the natural setting of $d = Theta(log n)$, our result implies an $tilde{Omega}(lg^2 n)$ lower bound, which is a quadratic improvement over the highest (non-oblivious) cell-probe lower bound for $mathsf{ANN}$. This is the first super-logarithmic $mathit{unconditional}$ lower bound for $mathsf{ANN}$ against general (non black-box) data structures. We also show that any oblivious $mathit{static}$ data structure for decomposable search problems (like $mathsf{ANN}$) can be obliviously dynamized with $O(log n)$ overhead in update and query time, strengthening a classic result of Bentley and Saxe (Algorithmica, 1980).
We consider the file maintenance problem (also called the online labeling problem) in which n integer items from the set {1,...,r} are to be stored in an array of size m >= n. The items are presented sequentially in an arbitrary order, and must be stored in the array in sorted order (but not necessarily in consecutive locations in the array). Each new item must be stored in the array before the next item is received. If r<=m then we can simply store item j in location j but if r>m then we may have to shift the location of stored items to make space for a newly arrived item. The algorithm is charged each time an item is stored in the array, or moved to a new location. The goal is to minimize the total number of such moves done by the algorithm. This problem is non-trivial when n=<m<r. In the case that m=Cn for some C>1, algorithms for this problem with cost O(log(n)^2) per item have been given [IKR81, Wil92, BCD+02]. When m=n, algorithms with cost O(log(n)^3) per item were given [Zha93, BS07]. In this paper we prove lower bounds that show that these algorithms are optimal, up to constant factors. Previously, the only lower bound known for this range of parameters was a lower bound of Omega(log(n)^2) for the restricted class of smooth algorithms [DSZ05a, Zha93]. We also provide an algorithm for the sparse case: If the number of items is polylogarithmic in the array size then the problem can be solved in amortized constant time per item.
We study the space complexity of sketching cuts and Laplacian quadratic forms of graphs. We show that any data structure which approximately stores the sizes of all cuts in an undirected graph on $n$ vertices up to a $1+epsilon$ error must use $Omega(nlog n/epsilon^2)$ bits of space in the worst case, improving the $Omega(n/epsilon^2)$ bound of Andoni et al. and matching the best known upper bound achieved by spectral sparsifiers. Our proof is based on a rigidity phenomenon for cut (and spectral) approximation which may be of independent interest: any two $d-$regular graphs which approximate each others cuts significantly better than a random graph approximates the complete graph must overlap in a constant fraction of their edges.
In the distributed subgraph-freeness problem, we are given a graph $H$, and asked to determine whether the network graph contains $H$ as a subgraph or not. Subgraph-freeness is an extremely local problem: if the network had no bandwidth constraints, we could detect any subgraph $H$ in $|H|$ rounds, by having each node of the network learn its entire $|H|$-neighborhood. However, when bandwidth is limited, the problem becomes harder. Upper and lower bounds in the presence of congestion have been established for several classes of subgraphs, including cycles, trees, and more complicated subgraphs. All bounds shown so far have been linear or sublinear. We show that the subgraph-freeness problem is not, in general, solvable in linear time: for any $k geq 2$, there exists a subgraph $H_k$ such that $H_k$-freeness requires $Omega( n^{2-1/k} / (Bk) )$ rounds to solve. Here $B$ is the bandwidth of each communication link. The lower bound holds even for diameter-3 subgraphs and diameter-3 network graphs. In particular, taking $k = Theta(log n)$, we obtain a lower bound of $Omega(n^2 / (B log n))$.