Testing Properties of Multiple Distributions with Few Samples

62 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Maryam Aliakbarpour

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Maryam Aliakbarpour - Sandeep Silwal

بنى وهياكل البيانات والخوارزميات الرياضيات المتقطعة التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We propose a new setting for testing properties of distributions while receiving samples from several distributions, but few samples per distribution. Given samples from $s$ distributions, $p_1, p_2, ldots, p_s$, we design testers for the following problems: (1) Uniformity Testing: Testing whether all the $p_i$s are uniform or $epsilon$-far from being uniform in $ell_1$-distance (2) Identity Testing: Testing whether all the $p_i$s are equal to an explicitly given distribution $q$ or $epsilon$-far from $q$ in $ell_1$-distance, and (3) Closeness Testing: Testing whether all the $p_i$s are equal to a distribution $q$ which we have sample access to, or $epsilon$-far from $q$ in $ell_1$-distance. By assuming an additional natural condition about the source distributions, we provide sample optimal testers for all of these problems.

قيم البحث

109 - Maryam Aliakbarpour , Themis Gouleakis , John Peebles 2019

In this work, we consider the sample complexity required for testing the monotonicity of distributions over partial orders. A distribution $p$ over a poset is monotone if, for any pair of domain elements $x$ and $y$ such that $x preceq y$, $p(x) leq p(y)$. To understand the sample complexity of this problem, we introduce a new property called bigness over a finite domain, where the distribution is $T$-big if the minimum probability for any domain element is at least $T$. We establish a lower bound of $Omega(n/log n)$ for testing bigness of distributions on domains of size $n$. We then build on these lower bounds to give $Omega(n/log{n})$ lower bounds for testing monotonicity over a matching poset of size $n$ and significantly improved lower bounds over the hypercube poset. We give sublinear sample complexity bounds for testing bigness and for testing monotonicity over the matching poset. We then give a number of tools for analyzing upper bounds on the sample complexity of the monotonicity testing problem.

بنى وهياكل البيانات والخوارزميات نظرية الإحصاء التعلم الالي

Testing properties of signed graphs

68 - Florian Adriaens , Simon Apers 2021

In graph property testing the task is to distinguish whether a graph satisfies a given property or is far from having that property, preferably with a sublinear query and time complexity. In this work we initiate the study of property testing in sign ed graphs, where every edge has either a positive or a negative sign. We show that there exist sublinear algorithms for testing three key properties of signed graphs: balance (or 2-clusterability), clusterability and signed triangle freeness. We consider both the dense graph model, where we can query the (signed) adjacency matrix of a signed graph, and the bounded-degree model, where we can query for the neighbors of a node and the sign of the connecting edge. Our algorithms use a variety of tools from graph property testing, as well as reductions from one setting to the other. Our main technical contribution is a sublinear algorithm for testing clusterability in the bounded-degree model. This contrasts with the property of k-clusterability which is not testable with a sublinear number of queries. The tester builds on the seminal work of Goldreich and Ron for testing bipartiteness.

بنى وهياكل البيانات والخوارزميات

Testing isomorphism of chordal graphs of bounded leafage is fixed-parameter tractable

209 - Vikraman Arvind , Roman Nedela , Ilia Ponomarenko 2021

It is known that testing isomorphism of chordal graphs is as hard as the general graph isomorphism problem. Every chordal graph can be represented as the intersection graph of some subtrees of a tree. The leafage of a chordal graph, is defined to be the minimum number of leaves in the representing tree. We construct a fixed-parameter tractable algorithm testing isomorphism of chordal graphs with bounded leafage. The key point is a fixed-parameter tractable algorithm finding the automorphism group of a colored order-3 hypergraph with bounded sizes of color classes of vertices.

بنى وهياكل البيانات والخوارزميات الرياضيات المتقطعة التوافقية

Optimal Identity Testing with High Probability

65 - Ilias Diakonikolas , Themis Gouleakis , John Peebles 2017

We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution $p$ over $n$ elements, an explicitly given distribution $q$, and parameters $ 0< epsilon, delta < 1$, we wish to distinguish, {em with probability at least $1-delta$}, whether the distributions are identical versus $varepsilon$-far in total variation distance. Most prior work focused on the case that $delta = Omega(1)$, for which the sample complexity of identity testing is known to be $Theta(sqrt{n}/epsilon^2)$. Given such an algorithm, one can achieve arbitrarily small values of $delta$ via black-box amplification, which multiplies the required number of samples by $Theta(log(1/delta))$. We show that black-box amplification is suboptimal for any $delta = o(1)$, and give a new identity tester that achieves the optimal sample complexity. Our new upper and lower bounds show that the optimal sample complexity of identity testing is [ Thetaleft( frac{1}{epsilon^2}left(sqrt{n log(1/delta)} + log(1/delta) right)right) ] for any $n, varepsilon$, and $delta$. For the special case of uniformity testing, where the given distribution is the uniform distribution $U_n$ over the domain, our new tester is surprisingly simple: to test whether $p = U_n$ versus $d_{mathrm TV}(p, U_n) geq varepsilon$, we simply threshold $d_{mathrm TV}(widehat{p}, U_n)$, where $widehat{p}$ is the empirical probability distribution. The fact that this simple plug-in estimator is sample-optimal is surprising, even in the constant $delta$ case. Indeed, it was believed that such a tester would not attain sublinear sample complexity even for constant values of $varepsilon$ and $delta$.

بنى وهياكل البيانات والخوارزميات نظرية المعلومات التعلم الآلي

Lower Bounds for Embedding into Distributions over Excluded Minor Graph Families

402 - Douglas E. Carroll , Ashish Goel 2008

It was shown recently by Fakcharoenphol et al that arbitrary finite metrics can be embedded into distributions over tree metrics with distortion O(log n). It is also known that this bound is tight since there are expander graphs which cannot be embed ded into distributions over trees with better than Omega(log n) distortion. We show that this same lower bound holds for embeddings into distributions over any minor excluded family. Given a family of graphs F which excludes minor M where |M|=k, we explicitly construct a family of graphs with treewidth-(k+1) which cannot be embedded into a distribution over F with better than Omega(log n) distortion. Thus, while these minor excluded families of graphs are more expressive than trees, they do not provide asymptotically better approximations in general. An important corollary of this is that graphs of treewidth-k cannot be embedded into distributions over graphs of treewidth-(k-3) with distortion less than Omega(log n). We also extend a result of Alon et al by showing that for any k, planar graphs cannot be embedded into distributions over treewidth-k graphs with better than Omega(log n) distortion.

بنى وهياكل البيانات والخوارزميات الرياضيات المتقطعة