Collision-based Testers are Optimal for Uniformity and Closeness

377 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ilias Diakonikolas

تاريخ النشر 2016

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Ilias Diakonikolas - Themis Gouleakis - John Peebles

بنى وهياكل البيانات والخوارزميات نظرية المعلومات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We study the fundamental problems of (i) uniformity testing of a discrete distribution, and (ii) closeness testing between two discrete distributions with bounded $ell_2$-norm. These problems have been extensively studied in distribution testing and sample-optimal estimators are known for them~cite{Paninski:08, CDVV14, VV14, DKN:15}. In this work, we show that the original collision-based testers proposed for these problems ~cite{GRdist:00, BFR+:00} are sample-optimal, up to constant factors. Previous analyses showed sample complexity upper bounds for these testers that are optimal as a function of the domain size $n$, but suboptimal by polynomial factors in the error parameter $epsilon$. Our main contribution is a new tight analysis establishing that these collision-based testers are information-theoretically optimal, up to constant factors, both in the dependence on $n$ and in the dependence on $epsilon$.

قيم البحث

92 - Yuxin Chen , Jianqing Fan , Cong Ma 2017

This paper is concerned with the problem of top-$K$ ranking from pairwise comparisons. Given a collection of $n$ items and a few pairwise comparisons across them, one wishes to identify the set of $K$ items that receive the highest ranks. To tackle t his problem, we adopt the logistic parametric model --- the Bradley-Terry-Luce model, where each item is assigned a latent preference score, and where the outcome of each pairwise comparison depends solely on the relative scores of the two items involved. Recent works have made significant progress towards characterizing the performance (e.g. the mean square error for estimating the scores) of several classical methods, including the spectral method and the maximum likelihood estimator (MLE). However, where they stand regarding top-$K$ ranking remains unsettled. We demonstrate that under a natural random sampling model, the spectral method alone, or the regularized MLE alone, is minimax optimal in terms of the sample complexity --- the number of paired comparisons needed to ensure exact top-$K$ identification, for the fixed dynamic range regime. This is accomplished via optimal control of the entrywise error of the score estimates. We complement our theoretical studies by numerical experiments, confirming that both methods yield low entrywise errors for estimating the underlying scores. Our theory is established via a novel leave-one-out trick, which proves effective for analyzing both iterative and non-iterative procedures. Along the way, we derive an elementary eigenvector perturbation bound for probability transition matrices, which parallels the Davis-Kahan $sinTheta$ theorem for symmetric matrices. This also allows us to close the gap between the $ell_2$ error upper bound for the spectral method and the minimax lower limit.

التعلم الالي نظرية المعلومات التعلم الآلي

Optimal Identity Testing with High Probability

65 - Ilias Diakonikolas , Themis Gouleakis , John Peebles 2017

We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution $p$ over $n$ elements, an explicitly given distribution $q$, and parameters $ 0< epsilon, delta < 1$, we wish to distinguish, {em with probability at least $1-delta$}, whether the distributions are identical versus $varepsilon$-far in total variation distance. Most prior work focused on the case that $delta = Omega(1)$, for which the sample complexity of identity testing is known to be $Theta(sqrt{n}/epsilon^2)$. Given such an algorithm, one can achieve arbitrarily small values of $delta$ via black-box amplification, which multiplies the required number of samples by $Theta(log(1/delta))$. We show that black-box amplification is suboptimal for any $delta = o(1)$, and give a new identity tester that achieves the optimal sample complexity. Our new upper and lower bounds show that the optimal sample complexity of identity testing is [ Thetaleft( frac{1}{epsilon^2}left(sqrt{n log(1/delta)} + log(1/delta) right)right) ] for any $n, varepsilon$, and $delta$. For the special case of uniformity testing, where the given distribution is the uniform distribution $U_n$ over the domain, our new tester is surprisingly simple: to test whether $p = U_n$ versus $d_{mathrm TV}(p, U_n) geq varepsilon$, we simply threshold $d_{mathrm TV}(widehat{p}, U_n)$, where $widehat{p}$ is the empirical probability distribution. The fact that this simple plug-in estimator is sample-optimal is surprising, even in the constant $delta$ case. Indeed, it was believed that such a tester would not attain sublinear sample complexity even for constant values of $varepsilon$ and $delta$.

بنى وهياكل البيانات والخوارزميات نظرية المعلومات التعلم الآلي

Pan-Private Uniformity Testing

68 - Kareem Amin , Matthew Joseph , 2019

A centrally differentially private algorithm maps raw data to differentially private outputs. In contrast, a locally differentially private algorithm may only access data through public interaction with data holders, and this interaction must be a di fferentially private function of the data. We study the intermediate model of pan-privacy. Unlike a locally private algorithm, a pan-private algorithm receives data in the clear. Unlike a centrally private algorithm, the algorithm receives data one element at a time and must maintain a differentially private internal state while processing this stream. First, we show that pure pan-privacy against multiple intrusions on the internal state is equivalent to sequentially interactive local privacy. Next, we contextualize pan-privacy against a single intrusion by analyzing the sample complexity of uniformity testing over domain $[k]$. Focusing on the dependence on $k$, centrally private uniformity testing has sample complexity $Theta(sqrt{k})$, while noninteractive locally private uniformity testing has sample complexity $Theta(k)$. We show that the sample complexity of pure pan-private uniformity testing is $Theta(k^{2/3})$. By a new $Omega(k)$ lower bound for the sequentially interactive setting, we also separate pan-private from sequentially interactive locally private and multi-intrusion pan-private uniformity testing.

بنى وهياكل البيانات والخوارزميات التشفير والأمن التعلم الآلي

Learning-Based UAV Trajectory Optimization with Collision Avoidance and Connectivity Constraints

123 - Xueyuan Wang , M. Cenk Gursoy 2021

Unmanned aerial vehicles (UAVs) are expected to be an integral part of wireless networks, and determining collision-free trajectories for multiple UAVs while satisfying requirements of connectivity with ground base stations (GBSs) is a challenging ta sk. In this paper, we first reformulate the multi-UAV trajectory optimization problem with collision avoidance and wireless connectivity constraints as a sequential decision making problem in the discrete time domain. We, then, propose a decentralized deep reinforcement learning approach to solve the problem. More specifically, a value network is developed to encode the expected time to destination given the agents joint state (including the agents information, the nearby agents observable information, and the locations of the nearby GBSs). A signal-to-interference-plus-noise ratio (SINR)-prediction neural network is also designed, using accumulated SINR measurements obtained when interacting with the cellular network, to map the GBSs locations into the SINR levels in order to predict the UAVs SINR. Numerical results show that with the value network and SINR-prediction network, real-time navigation for multi-UAVs can be efficiently performed in various environments with high success rate.

علم الروبوتات نظرية المعلومات التعلم الآلي

Instance Based Approximations to Profile Maximum Likelihood

98 - Nima Anari , Moses Charikar , Kirankumar Shiragur 2020

In this paper we provide a new efficient algorithm for approximately computing the profile maximum likelihood (PML) distribution, a prominent quantity in symmetric property estimation. We provide an algorithm which matches the previous best known eff icient algorithms for computing approximate PML distributions and improves when the number of distinct observed frequencies in the given instance is small. We achieve this result by exploiting new sparsity structure in approximate PML distributions and providing a new matrix rounding algorithm, of independent interest. Leveraging this result, we obtain the first provable computationally efficient implementation of PseudoPML, a general framework for estimating a broad class of symmetric properties. Additionally, we obtain efficient PML-based estimators for distributions with small profile entropy, a natural instance-based complexity measure. Further, we provide a simpler and more practical PseudoPML implementation that matches the best-known theoretical guarantees of such an estimator and evaluate this method empirically.

بنى وهياكل البيانات والخوارزميات نظرية المعلومات التعلم الآلي