Separating Adaptive Streaming from Oblivious Streaming

242 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Uri Stemmer

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Haim Kaplan - Yishay Mansour - Kobbi Nissim

بنى وهياكل البيانات والخوارزميات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We present a streaming problem for which every adversarially-robust streaming algorithm must use polynomial space, while there exists a classical (oblivious) streaming algorithm that uses only polylogarithmic space. This is the first separation between oblivious streaming and adversarially-robust streaming, and resolves one of the central open questions in adversarial robust streaming.

قيم البحث

اقرأ أيضاً

Streaming Complexity of SVMs

78 - Alexandr Andoni , Collin Burns , Yi Li 2020

We study the space complexity of solving the bias-regularized SVM problem in the streaming model. This is a classic supervised learning problem that has drawn lots of attention, including for developing fast algorithms for solving the problem approxi mately. One of the most widely used algorithms for approximately optimizing the SVM objective is Stochastic Gradient Descent (SGD), which requires only $O(frac{1}{lambdaepsilon})$ random samples, and which immediately yields a streaming algorithm that uses $O(frac{d}{lambdaepsilon})$ space. For related problems, better streaming algorithms are only known for smooth functions, unlike the SVM objective that we focus on in this work. We initiate an investigation of the space complexity for both finding an approximate optimum of this objective, and for the related ``point estimation problem of sketching the data set to evaluate the function value $F_lambda$ on any query $(theta, b)$. We show that, for both problems, for dimensions $d=1,2$, one can obtain streaming algorithms with space polynomially smaller than $frac{1}{lambdaepsilon}$, which is the complexity of SGD for strongly convex functions like the bias-regularized SVM, and which is known to be tight in general, even for $d=1$. We also prove polynomial lower bounds for both point estimation and optimization. In particular, for point estimation we obtain a tight bound of $Theta(1/sqrt{epsilon})$ for $d=1$ and a nearly tight lower bound of $widetilde{Omega}(d/{epsilon}^2)$ for $d = Omega( log(1/epsilon))$. Finally, for optimization, we prove a $Omega(1/sqrt{epsilon})$ lower bound for $d = Omega( log(1/epsilon))$, and show similar bounds when $d$ is constant.

بنى وهياكل البيانات والخوارزميات التعقيد الحسابي الهندسة الحسابية

Streaming Adaptive Nonparametric Variational Autoencoder

61 - Tingting Zhao , Zifeng Wang , Aria Masoomi 2019

We develop a data driven approach to perform clustering and end-to-end feature learning simultaneously for streaming data that can adaptively detect novel clusters in emerging data. Our approach, Adaptive Nonparametric Variational Autoencoder (AdapVA E), learns the cluster membership through a Bayesian Nonparametric (BNP) modeling framework with Deep Neural Networks (DNNs) for feature learning. We develop a joint online variational inference algorithm to learn feature representations and clustering assignments simultaneously via iteratively optimizing the Evidence Lower Bound (ELBO). We resolve the catastrophic forgetting citep{kirkpatrick2017overcoming} challenges with streaming data by adopting generative samples from the trained AdapVAE using previous data, which avoids the need of storing and reusing past data. We demonstrate the advantages of our model including adaptive novel cluster detection without discarding useful information learned from past data, high quality sample generation and comparable clustering performance as end-to-end batch mode clustering methods on both image and text corpora benchmark datasets.

التعلم الالي التعلم الآلي

Time Bounds for Streaming Problems

411 - Raphael Clifford , Markus Jalsenius , Benjamin Sach 2014

We give tight cell-probe bounds for the time to compute convolution, multiplication and Hamming distance in a stream. The cell probe model is a particularly strong computational model and subsumes, for example, the popular word RAM model. We first consider online convolution where the task is to output the inner product between a fixed $n$-dimensional vector and a vector of the $n$ most recent values from a stream. One symbol of the stream arrives at a time and the each output must be computed before the next symbols arrives. Next we show bounds for online multiplication where the stream consists of pairs of digits, one from each of two $n$ digit numbers that are to be multiplied. One pair arrives at a time and the task is to output a single new digit from the product before the next pair of digits arrives. Finally we look at the online Hamming distance problem where the Hamming distance is outputted instead of the inner product. For each of these three problems, we give a lower bound of $Omega(frac{delta}{w}log n)$ time on average per output, where $delta$ is the number of bits needed to represent an input symbol and $w$ is the cell or word size. We argue that these bound are in fact tight within the cell probe model.

بنى وهياكل البيانات والخوارزميات التعقيد الحسابي

The streaming $k$-mismatch problem

117 - Raphael Clifford , Tomasz Kociumaka , Ely Porat 2017

We consider the streaming complexity of a fundamental task in approximate pattern matching: the $k$-mismatch problem. It asks to compute Hamming distances between a pattern of length $n$ and all length-$n$ substrings of a text for which the Hamming d istance does not exceed a given threshold $k$. In our problem formulation, we report not only the Hamming distance but also, on demand, the full emph{mismatch information}, that is the list of mismatched pairs of symbols and their indices. The twin challenges of streaming pattern matching derive from the need both to achieve small working space and also to guarantee that every arriving input symbol is processed quickly. We present a streaming algorithm for the $k$-mismatch problem which uses $O(klog{n}logfrac{n}{k})$ bits of space and spends ourcomplexity time on each symbol of the input stream, which consists of the pattern followed by the text. The running time almost matches the classic offline solution and the space usage is within a logarithmic factor of optimal. Our new algorithm therefore effectively resolves and also extends an open problem first posed in FOCS09. En route to this solution, we also give a deterministic $O( k (log frac{n}{k} + log |Sigma|) )$-bit encoding of all the alignments with Hamming distance at most $k$ of a length-$n$ pattern within a text of length $O(n)$. This secondary result provides an optimal solution to a natural communication complexity problem which may be of independent interest.

بنى وهياكل البيانات والخوارزميات

Graph Streaming Lower Bounds for Parameter Estimation and Property Testing via a Streaming XOR Lemma

83 - Sepehr Assadi , Vishvajeet N 2021

We study space-pass tradeoffs in graph streaming algorithms for parameter estimation and property testing problems such as estimating the size of maximum matchings and maximum cuts, weight of minimum spanning trees, or testing if a graph is connected or cycle-free versus being far from these properties. We develop a new lower bound technique that proves that for many problems of interest, including all the above, obtaining a $(1+epsilon)$-approximation requires either $n^{Omega(1)}$ space or $Omega(1/epsilon)$ passes, even on highly restricted families of graphs such as bounded-degree planar graphs. For multiple of these problems, this bound matches those of existing algorithms and is thus (asymptotically) optimal. Our results considerably strengthen prior lower bounds even for arbitrary graphs: starting from the influential work of [Verbin, Yu; SODA 2011], there has been a plethora of lower bounds for single-pass algorithms for these problems; however, the only multi-pass lower bounds proven very recently in [Assadi, Kol, Saxena, Yu; FOCS 2020] rules out sublinear-space algorithms with exponentially smaller $o(log{(1/epsilon)})$ passes for these problems. One key ingredient of our proofs is a simple streaming XOR Lemma, a generic hardness amplification result, that we prove: informally speaking, if a $p$-pass $s$-space streaming algorithm can only solve a decision problem with advantage $delta > 0$ over random guessing, then it cannot solve XOR of $ell$ independent copies of the problem with advantage much better than $delta^{ell}$. This result can be of independent interest and useful for other streaming lower bounds as well.

بنى وهياكل البيانات والخوارزميات التعقيد الحسابي

سجل دخول لتتمكن من نشر تعليقات