No Arabic abstract
A streaming algorithm is said to be adversarially robust if its accuracy guarantees are maintained even when the data stream is chosen maliciously, by an adaptive adversary. We establish a connection between adversarial robustness of streaming algorithms and the notion of differential privacy. This connection allows us to design new adversarially robust streaming algorithms that outperform the current state-of-the-art constructions for many interesting regimes of parameters.
A streaming algorithm is adversarially robust if it is guaranteed to perform correctly even in the presence of an adaptive adversary. Recently, several sophisticated frameworks for robustification of classical streaming algorithms have been developed. One of the main open questions in this area is whether efficient adversarially robust algorithms exist for moment estimation problems under the turnstile streaming model, where both insertions and deletions are allowed. So far, the best known space complexity for streams of length $m$, achieved using differential privacy (DP) based techniques, is of order $tilde{O}(m^{1/2})$ for computing a constant-factor approximation with high constant probability. In this work, we propose a new simple approach to tracking moments by alternating between two different regimes: a sparse regime, in which we can explicitly maintain the current frequency vector and use standard sparse recovery techniques, and a dense regime, in which we make use of existing DP-based robustification frameworks. The results obtained using our technique break the previous $m^{1/2}$ barrier for any fixed $p$. More specifically, our space complexity for $F_2$-estimation is $tilde{O}(m^{2/5})$ and for $F_0$-estimation, i.e., counting the number of distinct elements, it is $tilde O(m^{1/3})$. All existing robustness frameworks have their space complexity depend multiplicatively on a parameter $lambda$ called the emph{flip number} of the streaming problem, where $lambda = m$ in turnstile moment estimation. The best known dependence in these frameworks (for constant factor approximation) is of order $tilde{O}(lambda^{1/2})$, and it is known to be tight for certain problems. Again, our approach breaks this barrier, achieving a dependence of order $tilde{O}(lambda^{1/2 - c(p)})$ for $F_p$-estimation, where $c(p) > 0$ depends only on $p$.
Streaming algorithms are algorithms for processing large data streams, using only a limited amount of memory. Classical streaming algorithms operate under the assumption that the input stream is fixed in advance. Recently, there is a growing interest in studying streaming algorithms that provide provable guarantees even when the input stream is chosen by an adaptive adversary. Such streaming algorithms are said to be {em adversarially-robust}. We propose a novel framework for adversarial streaming that hybrids two recently suggested frameworks by Hassidim et al. (2020) and by Woodruff and Zhou (2021). These recently suggested frameworks rely on very different ideas, each with its own strengths and weaknesses. We combine these two frameworks (in a non-trivial way) into a single hybrid framework that gains from both approaches to obtain superior performances for turnstile streams.
We give the first single-pass streaming algorithm for Column Subset Selection with respect to the entrywise $ell_p$-norm with $1 leq p < 2$. We study the $ell_p$ norm loss since it is often considered more robust to noise than the standard Frobenius norm. Given an input matrix $A in mathbb{R}^{d times n}$ ($n gg d$), our algorithm achieves a multiplicative $k^{frac{1}{p} - frac{1}{2}}text{poly}(log nd)$-approximation to the error with respect to the best possible column subset of size $k$. Furthermore, the space complexity of the streaming algorithm is optimal up to a logarithmic factor. Our streaming algorithm also extends naturally to a 1-round distributed protocol with nearly optimal communication cost. A key ingredient in our algorithms is a reduction to column subset selection in the $ell_{p,2}$-norm, which corresponds to the $p$-norm of the vector of Euclidean norms of each of the columns of $A$. This enables us to leverage strong coreset constructions for the Euclidean norm, which previously had not been applied in this context. We also give the first provable guarantees for greedy column subset selection in the $ell_{1, 2}$ norm, which can be used as an alternative, practical subroutine in our algorithms. Finally, we show that our algorithms give significant practical advantages on real-world data analysis tasks.
Motivated by the recent discovery that the interpretation maps of CNNs could easily be manipulated by adversarial attacks against network interpretability, we study the problem of interpretation robustness from a new perspective of Renyi differential privacy (RDP). The advantages of our Renyi-Robust-Smooth (RDP-based interpretation method) are three-folds. First, it can offer provable and certifiable top-$k$ robustness. That is, the top-$k$ important attributions of the interpretation map are provably robust under any input perturbation with bounded $ell_d$-norm (for any $dgeq 1$, including $d = infty$). Second, our proposed method offers $sim10%$ better experimental robustness than existing approaches in terms of the top-$k$ attributions. Remarkably, the accuracy of Renyi-Robust-Smooth also outperforms existing approaches. Third, our method can provide a smooth tradeoff between robustness and computational efficiency. Experimentally, its top-$k$ attributions are {em twice} more robust than existing approaches when the computational resources are highly constrained.
We consider the problem of designing and analyzing differentially private algorithms that can be implemented on {em discrete} models of computation in {em strict} polynomial time, motivated by known attacks on floating point implementations of real-arithmetic differentially private algorithms (Mironov, CCS 2012) and the potential for timing attacks on expected polynomial-time algorithms. As a case study, we examine the basic problem of approximating the histogram of a categorical dataset over a possibly large data universe $mathcal{X}$. The classic Laplace Mechanism (Dwork, McSherry, Nissim, Smith, TCC 2006 and J. Privacy & Confidentiality 2017) does not satisfy our requirements, as it is based on real arithmetic, and natural discrete analogues, such as the Geometric Mechanism (Ghosh, Roughgarden, Sundarajan, STOC 2009 and SICOMP 2012), take time at least linear in $|mathcal{X}|$, which can be exponential in the bit length of the input. In this paper, we provide strict polynomial-time discrete algorithms for approximate histograms whose simultaneous accuracy (the maximum error over all bins) matches that of the Laplace Mechanism up to constant factors, while retaining the same (pure) differential privacy guarantee. One of our algorithms produces a sparse histogram as output. Its per-bin accuracy (the error on individual bins) is worse than that of the Laplace Mechanism by a factor of $log|mathcal{X}|$, but we prove a lower bound showing that this is necessary for any algorithm that produces a sparse histogram. A second algorithm avoids this lower bound, and matches the per-bin accuracy of the Laplace Mechanism, by producing a compact and efficiently computable representation of a dense histogram, it is based on an $(n+1)$-wise independent implementation of an appropriately clamped version of the Discrete Geometric Mechanism.