No Arabic abstract
Given a stream $S = (s_1, s_2, ..., s_N)$, a $phi$-heavy hitter is an item $s_i$ that occurs at least $phi N$ times in $S$. The problem of finding heavy-hitters has been extensively studied in the database literature. In this paper, we study a related problem. We say that there is a $phi$-event at time $t$ if $s_t$ occurs exactly $phi N$ times in $(s_1, s_2, ..., s_t)$. Thus, for each $phi$-heavy hitter there is a single $phi$-event which occurs when its count reaches the reporting threshold $phi N$. We define the online event-detection problem (OEDP) as: given $phi$ and a stream $S$, report all $phi$-events as soon as they occur. Many real-world monitoring systems demand event detection where all events must be reported (no false negatives), in a timely manner, with no non-events reported (no false positives), and a low reporting threshold. As a result, the OEDP requires a large amount of space (Omega(N) words) and is not solvable in the streaming model or via standard sampling-based approaches. Since OEDP requires large space, we focus on cache-efficient algorithms in the external-memory model. We provide algorithms for the OEDP that are within a log factor of optimal. Our algorithms are tunable: its parameters can be set to allow for a bounded false-positives and a bounded delay in reporting. None of our relaxations allow false negatives since reporting all events is a strict requirement of our applications. Finally, we show improved results when the count of items in the input stream follows a power-law distribution.
We consider the online Min-Sum Set Cover (MSSC), a natural and intriguing generalization of the classical list update problem. In Online MSSC, the algorithm maintains a permutation on $n$ elements based on subsets $S_1, S_2, ldots$ arriving online. The algorithm serves each set $S_t$ upon arrival, using its current permutation $pi_{t}$, incurring an access cost equal to the position of the first element of $S_t$ in $pi_{t}$. Then, the algorithm may update its permutation to $pi_{t+1}$, incurring a moving cost equal to the Kendall tau distance of $pi_{t}$ to $pi_{t+1}$. The objective is to minimize the total access and moving cost for serving the entire sequence. We consider the $r$-uniform version, where each $S_t$ has cardinality $r$. List update is the special case where $r = 1$. We obtain tight bounds on the competitive ratio of deterministic online algorithms for MSSC against a static adversary, that serves the entire sequence by a single permutation. First, we show a lower bound of $(r+1)(1-frac{r}{n+1})$ on the competitive ratio. Then, we consider several natural generalizations of successful list update algorithms and show that they fail to achieve any interesting competitive guarantee. On the positive side, we obtain a $O(r)$-competitive deterministic algorithm using ideas from online learning and the multiplicative weight updates (MWU) algorithm. Furthermore, we consider efficient algorithms. We propose a memoryless online algorithm, called Move-All-Equally, which is inspired by the Double Coverage algorithm for the $k$-server problem. We show that its competitive ratio is $Omega(r^2)$ and $2^{O(sqrt{log n cdot log r})}$, and conjecture that it is $f(r)$-competitive. We also compare Move-All-Equally against the dynamic optimal solution and obtain (almost) tight bounds by showing that it is $Omega(r sqrt{n})$ and $O(r^{3/2} sqrt{n})$-competitive.
We study emph{parallel} online algorithms: For some fixed integer $k$, a collective of $k$ parallel processes that perform online decisions on the same sequence of events forms a $k$-emph{copy algorithm}. For any given time and input sequence, the overall performance is determined by the best of the $k$ individual total results. Problems of this type have been considered for online makespan minimization; they are also related to optimization with emph{advice} on future events, i.e., a number of bits available in advance. We develop textsc{Predictive Harmonic}$_3$ (PH3), a relatively simple family of $k$-copy algorithms for the online Bin Packing Problem, whose joint competitive factor converges to 1.5 for increasing $k$. In particular, we show that $k=6$ suffices to guarantee a factor of $1.5714$ for PH3, which is better than $1.57829$, the performance of the best known 1-copy algorithm textsc{Advanced Harmonic}, while $k=11$ suffices to achieve a factor of $1.5406$, beating the known lower bound of $1.54278$ for a single online algorithm. In the context of online optimization with advice, our approach implies that 4 bits suffice to achieve a factor better than this bound of $1.54278$, which is considerably less than the previous bound of 15 bits.
We study the online maximum coverage problem on a line, in which, given an online sequence of sub-intervals (which may intersect among each other) of a target large interval and an integer $k$, we aim to select at most $k$ of the sub-intervals such that the total covered length of the target interval is maximized. The decision to accept or reject each sub-interval is made immediately and irrevocably (no preemption) right at the release timestamp of the sub-interval. We comprehensively study different settings of this problem regarding both the length of a released sub-interval and the total number of released sub-intervals. We first present lower bounds on the competitive ratio for the settings concerned in this paper, respectively. For the offline problem where the sequence of all the released sub-intervals is known in advance to the decision-maker, we propose a dynamic-programming-based optimal approach as the benchmark. For the online problem, we first propose a single-threshold-based deterministic algorithm SOA by adding a sub-interval if the added length exceeds a certain threshold, achieving competitive ratios close to the lower bounds, respectively. Then, we extend to a double-thresholds-based algorithm DOA, by using the first threshold for exploration and the second threshold (larger than the first one) for exploitation. With the two thresholds solved by our proposed program, we show that DOA improves SOA in the worst-case performance. Moreover, we prove that a deterministic algorithm that accepts sub-intervals by multi non-increasing thresholds cannot outperform even SOA.
We consider the file maintenance problem (also called the online labeling problem) in which n integer items from the set {1,...,r} are to be stored in an array of size m >= n. The items are presented sequentially in an arbitrary order, and must be stored in the array in sorted order (but not necessarily in consecutive locations in the array). Each new item must be stored in the array before the next item is received. If r<=m then we can simply store item j in location j but if r>m then we may have to shift the location of stored items to make space for a newly arrived item. The algorithm is charged each time an item is stored in the array, or moved to a new location. The goal is to minimize the total number of such moves done by the algorithm. This problem is non-trivial when n=<m<r. In the case that m=Cn for some C>1, algorithms for this problem with cost O(log(n)^2) per item have been given [IKR81, Wil92, BCD+02]. When m=n, algorithms with cost O(log(n)^3) per item were given [Zha93, BS07]. In this paper we prove lower bounds that show that these algorithms are optimal, up to constant factors. Previously, the only lower bound known for this range of parameters was a lower bound of Omega(log(n)^2) for the restricted class of smooth algorithms [DSZ05a, Zha93]. We also provide an algorithm for the sparse case: If the number of items is polylogarithmic in the array size then the problem can be solved in amortized constant time per item.
In this paper, we strengthen the competitive analysis results obtained for a fundamental online streaming problem, the Frequent Items Problem. Additionally, we contribute with a more detailed analysis of this problem, using alternative performance measures, supplementing the insight gained from competitive analysis. The results also contribute to the general study of performance measures for online algorithms. It has long been known that competitive analysis suffers from drawbacks in certain situations, and many alternative measures have been proposed. However, more systematic comparative studies of performance measures have been initiated recently, and we continue this work, using competitive analysis, relative interval analysis, and relative worst order analysis on the Frequent Items Problem.