No Arabic abstract
The classical analysis of online algorithms, due to its worst-case nature, can be quite pessimistic when the input instance at hand is far from worst-case. Often this is not an issue with machine learning approaches, which shine in exploiting patterns in past inputs in order to predict the future. However, such predictions, although usually accurate, can be arbitrarily poor. Inspired by a recent line of work, we augment three well-known online settings with machine learned predictions about the future, and develop algorithms that take them into account. In particular, we study the following online selection problems: (i) the classical secretary problem, (ii) online bipartite matching and (iii) the graphic matroid secretary problem. Our algorithms still come with a worst-case performance guarantee in the case that predictions are subpar while obtaining an improved competitive ratio (over the best-known classical online algorithm for each problem) when the predictions are sufficiently accurate. For each algorithm, we establish a trade-off between the competitive ratios obtained in the two respective cases.
In the model of online caching with machine learned advice, introduced by Lykouris and Vassilvitskii, the goal is to solve the caching problem with an online algorithm that has access to next-arrival predictions: when each input element arrives, the algorithm is given a prediction of the next time when the element will reappear. The traditional model for online caching suffers from an $Omega(log k)$ competitive ratio lower bound (on a cache of size $k$). In contrast, the augmented model admits algorithms which beat this lower bound when the predictions have low error, and asymptotically match the lower bound when the predictions have high error, even if the algorithms are oblivious to the prediction error. In particular, Lykouris and Vassilvitskii showed that there is a prediction-augmented caching algorithm with a competitive ratio of $O(1+min(sqrt{eta/OPT}, log k))$ when the overall $ell_1$ prediction error is bounded by $eta$, and $OPT$ is the cost of the optimal offline algorithm. The dependence on $k$ in the competitive ratio is optimal, but the dependence on $eta/OPT$ may be far from optimal. In this work, we make progress towards closing this gap. Our contributions are twofold. First, we provide an improved algorithm with a competitive ratio of $O(1 + min((eta/OPT)/k, 1) log k)$. Second, we provide a lower bound of $Omega(log min((eta/OPT)/(k log k), k))$.
We provide online algorithms for secretary matching in general weighted graphs, under the well-studied models of vertex and edge arrivals. In both models, edges are associated with arbitrary weights that are unknown from the outset, and are revealed online. Under vertex arrival, vertices arrive online in a uniformly random order; upon the arrival of a vertex $v$, the weights of edges from $v$ to all previously arriving vertices are revealed, and the algorithm decides which of these edges, if any, to include in the matching. Under edge arrival, edges arrive online in a uniformly random order; upon the arrival of an edge $e$, its weight is revealed, and the algorithm decides whether to include it in the matching or not. We provide a $5/12$-competitive algorithm for vertex arrival, and show it is tight. For edge arrival, we provide a $1/4$-competitive algorithm. Both results improve upon state of the art bounds for the corresponding settings. Interestingly, for vertex arrival, secretary matching in general graphs outperforms secretary matching in bipartite graphs with 1-sided arrival, where $1/e$ is the best possible guarantee.
The bin covering problem asks for covering a maximum number of bins with an online sequence of $n$ items of different sizes in the range $(0,1]$; a bin is said to be covered if it receives items of total size at least 1. We study this problem in the advice setting and provide tight bounds for the size of advice required to achieve optimal solutions. Moreover, we show that any algorithm with advice of size $o(log log n)$ has a competitive ratio of at most 0.5. In other words, advice of size $o(log log n)$ is useless for improving the competitive ratio of 0.5, attainable by an online algorithm without advice. This result highlights a difference between the bin covering and the bin packing problems in the advice model: for the bin packing problem, there are several algorithms with advice of constant size that outperform online algorithms without advice. Furthermore, we show that advice of size $O(log log n)$ is sufficient to achieve a competitive ratio that is arbitrarily close to $0.53bar{3}$ and hence strictly better than the best ratio $0.5$ attainable by purely online algorithms. The technicalities involved in introducing and analyzing this algorithm are quite different from the existing results for the bin packing problem and confirm the different nature of these two problems. Finally, we show that a linear number of bits of advice is necessary to achieve any competitive ratio better than 15/16 for the online bin covering problem.
We consider online algorithms for the {em page migration problem} that use predictions, potentially imperfect, to improve their performance. The best known online algorithms for this problem, due to Westbrook94 and Bienkowski et al17, have competitive ratios strictly bounded away from 1. In contrast, we show that if the algorithm is given a prediction of the input sequence, then it can achieve a competitive ratio that tends to $1$ as the prediction error rate tends to $0$. Specifically, the competitive ratio is equal to $1+O(q)$, where $q$ is the prediction error rate. We also design a ``fallback option that ensures that the competitive ratio of the algorithm for {em any} input sequence is at most $O(1/q)$. Our result adds to the recent body of work that uses machine learning to improve the performance of ``classic algorithms.
We give cell-probe bounds for the computation of edit distance, Hamming distance, convolution and longest common subsequence in a stream. In this model, a fixed string of $n$ symbols is given and one $delta$-bit symbol arrives at a time in a stream. After each symbol arrives, the distance between the fixed string and a suffix of most recent symbols of the stream is reported. The cell-probe model is perhaps the strongest model of computation for showing data structure lower bounds, subsuming in particular the popular word-RAM model. * We first give an $Omega((delta log n)/(w+loglog n))$ lower bound for the time to give each output for both online Hamming distance and convolution, where $w$ is the word size. This bound relies on a new encoding scheme and for the first time holds even when $w$ is as small as a single bit. * We then consider the online edit distance and longest common subsequence problems in the bit-probe model ($w=1$) with a constant sized input alphabet. We give a lower bound of $Omega(sqrt{log n}/(loglog n)^{3/2})$ which applies for both problems. This second set of results relies both on our new encoding scheme as well as a carefully constructed hard distribution. * Finally, for the online edit distance problem we show that there is an $O((log n)^2/w)$ upper bound in the cell-probe model. This bound gives a contrast to our new lower bound and also establishes an exponential gap between the known cell-probe and RAM model complexities.