Optimal lower bounds for universal relation, samplers, and finding duplicates


Abstract in English

In the communication problem $mathbf{UR}$ (universal relation) [KRW95], Alice and Bob respectively receive $x$ and $y$ in ${0,1}^n$ with the promise that $x eq y$. The last player to receive a message must output an index $i$ such that $x_i eq y_i$. We prove that the randomized one-way communication complexity of this problem in the public coin model is exactly $Theta(min{n, log(1/delta)log^2(frac{n}{log(1/delta)})})$ bits for failure probability $delta$. Our lower bound holds even if promised $mathop{support}(y)subset mathop{support}(x)$. As a corollary, we obtain optimal lower bounds for $ell_p$-sampling in strict turnstile streams for $0le p < 2$, as well as for the problem of finding duplicates in a stream. Our lower bounds do not need to use large weights, and hold even if it is promised that $xin{0,1}^n$ at all points in the stream. Our lower bound demonstrates that any algorithm $mathcal{A}$ solving sampling problems in turnstile streams in low memory can be used to encode subsets of $[n]$ of certain sizes into a number of bits below the information theoretic minimum. Our encoder makes adaptive queries to $mathcal{A}$ throughout its execution, but done carefully so as to not violate correctness. This is accomplished by injecting random noise into the encoders interactions with $mathcal{A}$, which is loosely motivated by techniques in differential privacy. Our correctness analysis involves understanding the ability of $mathcal{A}$ to correctly answer adaptive queries which have positive but bounded mutual information with $mathcal{A}$s internal randomness, and may be of independent interest in the newly emerging area of adaptive data analysis with a theoretical computer science lens.

Download