ترغب بنشر مسار تعليمي؟ اضغط هنا

Improved Algorithms for Time Decay Streams

167   0   0.0 ( 0 )
 نشر من قبل Samson Zhou
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In the time-decay model for data streams, elements of an underlying data set arrive sequentially with the recently arrived elements being more important. A common approach for handling large data sets is to maintain a emph{coreset}, a succinct summary of the processed data that allows approximate recovery of a predetermined query. We provide a general framework that takes any offline-coreset and gives a time-decay coreset for polynomial time decay functions. We also consider the exponential time decay model for $k$-median clustering, where we provide a constant factor approximation algorithm that utilizes the online facility location algorithm. Our algorithm stores $mathcal{O}(klog(hDelta)+h)$ points where $h$ is the half-life of the decay function and $Delta$ is the aspect ratio of the dataset. Our techniques extend to $k$-means clustering and $M$-estimators as well.

قيم البحث

اقرأ أيضاً

60 - Aaron Bernstein 2020
We study the problem of computing an approximate maximum cardinality matching in the semi-streaming model when edges arrive in a emph{random} order. In the semi-streaming model, the edges of the input graph G = (V,E) are given as a stream e_1, ..., e _m, and the algorithm is allowed to make a single pass over this stream while using $O(n textrm{polylog}(n))$ space ($m = |E|$ and $n = |V|$). If the order of edges is adversarial, a simple single-pass greedy algorithm yields a $1/2$-approximation in $O(n)$ space; achieving a better approximation in adversarial streams remains an elusive open question. A line of recent work shows that one can improve upon the $1/2$-approximation if the edges of the stream arrive in a random order. The state of the art for this model is two-fold: Assadi et al. [SODA 2019] show how to compute a $2/3(sim.66)$-approximate matching, but the space requirement is $O(n^{1.5} textrm{polylog}(n))$. Very recently, Farhadi et al. [SODA 2020] presented an algorithm with the desired space usage of $O(n textrm{polylog}(n))$, but a worse approximation ratio of $6/11(sim.545)$, or $3/5(=.6)$ in bipartite graphs. In this paper, we present an algorithm that computes a $2/3(sim.66)$-approximate matching using only $O(n log(n))$ space, improving upon both results above. We also note that for adversarial streams, a lower bound of Kapralov [SODA 2013] shows that any algorithm that achieves a $1-1/e(sim.63)$-approximation requires $(n^{1+Omega(1/loglog(n))})$ space. Our result for random-order streams is the first to go beyond the adversarial-order lower bound, thus establishing that computing a maximum matching is provably easier in random-order streams.
100 - Thomas Bosman , Neil Olver 2019
We give new approximation algorithms for the submodular joint replenishment problem and the inventory routing problem, using an iterative rounding approach. In both problems, we are given a set of $N$ items and a discrete time horizon of $T$ days in which given demands for the items must be satisfied. Ordering a set of items incurs a cost according to a set function, with properties depending on the problem under consideration. Demand for an item at time $t$ can be satisfied by an order on any day prior to $t$, but a holding cost is charged for storing the items during the intermediate period; the goal is to minimize the sum of the ordering and holding cost. Our approximation factor for both problems is $O(log log min(N,T))$; this improves exponentially on the previous best results.
In the relay placement problem the input is a set of sensors and a number $r ge 1$, the communication range of a relay. In the one-tier version of the problem the objective is to place a minimum number of relays so that between every pair of sensors there is a path through sensors and/or relays such that the consecutive vertices of the path are within distance $r$ if both vertices are relays and within distance 1 otherwise. The two-tier version adds the restrictions that the path must go through relays, and not through sensors. We present a 3.11-approximation algorithm for the one-tier version and a PTAS for the two-tier version. We also show that the one-tier version admits no PTAS, assuming P $ e$ NP.
In the problem of adaptive compressed sensing, one wants to estimate an approximately $k$-sparse vector $xinmathbb{R}^n$ from $m$ linear measurements $A_1 x, A_2 x,ldots, A_m x$, where $A_i$ can be chosen based on the outcomes $A_1 x,ldots, A_{i-1} x $ of previous measurements. The goal is to output a vector $hat{x}$ for which $$|x-hat{x}|_p le C cdot min_{ktext{-sparse } x} |x-x|_q,$$ with probability at least $2/3$, where $C > 0$ is an approximation factor. Indyk, Price and Woodruff (FOCS11) gave an algorithm for $p=q=2$ for $C = 1+epsilon$ with $Oh((k/epsilon) loglog (n/k))$ measurements and $Oh(log^*(k) loglog (n))$ rounds of adaptivity. We first improve their bounds, obtaining a scheme with $Oh(k cdot loglog (n/k) +(k/epsilon) cdot loglog(1/epsilon))$ measurements and $Oh(log^*(k) loglog (n))$ rounds, as well as a scheme with $Oh((k/epsilon) cdot loglog (nlog (n/k)))$ measurements and an optimal $Oh(loglog (n))$ rounds. We then provide novel adaptive compressed sensing schemes with improved bounds for $(p,p)$ for every $0 < p < 2$. We show that the improvement from $O(k log(n/k))$ measurements to $O(k log log (n/k))$ measurements in the adaptive setting can persist with a better $epsilon$-dependence for other values of $p$ and $q$. For example, when $(p,q) = (1,1)$, we obtain $O(frac{k}{sqrt{epsilon}} cdot log log n log^3 (frac{1}{epsilon}))$ measurements.
We consider the $k$-clustering problem with $ell_p$-norm cost, which includes $k$-median, $k$-means and $k$-center cost functions, under an individual notion of fairness proposed by Jung et al. [2020]: given a set of points $P$ of size $n$, a set of $k$ centers induces a fair clustering if for every point $vin P$, $v$ can find a center among its $n/k$ closest neighbors. Recently, Mahabadi and Vakilian [2020] showed how to get a $(p^{O(p)},7)$-bicriteria approximation for the problem of fair $k$-clustering with $ell_p$-norm cost: every point finds a center within distance at most $7$ times its distance to its $(n/k)$-th closest neighbor and the $ell_p$-norm cost of the solution is at most $p^{O(p)}$ times the cost of an optimal fair solution. In this work, for any $varepsilon>0$, we present an improved $(16^p +varepsilon,3)$-bicriteria approximation for the fair $k$-clustering with $ell_p$-norm cost. To achieve our guarantees, we extend the framework of [Charikar et al., 2002, Swamy, 2016] and devise a $16^p$-approximation algorithm for the facility location with $ell_p$-norm cost under matroid constraint which might be of an independent interest. Besides, our approach suggests a reduction from our individually fair clustering to a clustering with a group fairness requirement proposed by Kleindessner et al. [2019], which is essentially the median matroid problem [Krishnaswamy et al., 2011].
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا