ﻻ يوجد ملخص باللغة العربية
We investigate the problem of identity testing for multidimensional histogram distributions. A distribution $p: D rightarrow mathbb{R}_+$, where $D subseteq mathbb{R}^d$, is called a $k$-histogram if there exists a partition of the domain into $k$ axis-aligned rectangles such that $p$ is constant within each such rectangle. Histograms are one of the most fundamental nonparametric families of distributions and have been extensively studied in computer science and statistics. We give the first identity tester for this problem with {em sub-learning} sample complexity in any fixed dimension and a nearly-matching sample complexity lower bound. In more detail, let $q$ be an unknown $d$-dimensional $k$-histogram distribution in fixed dimension $d$, and $p$ be an explicitly given $d$-dimensional $k$-histogram. We want to correctly distinguish, with probability at least $2/3$, between the case that $p = q$ versus $|p-q|_1 geq epsilon$. We design an algorithm for this hypothesis testing problem with sample complexity $O((sqrt{k}/epsilon^2) 2^{d/2} log^{2.5 d}(k/epsilon))$ that runs in sample-polynomial time. Our algorithm is robust to model misspecification, i.e., succeeds even if $q$ is only promised to be {em close} to a $k$-histogram. Moreover, for $k = 2^{Omega(d)}$, we show a sample complexity lower bound of $(sqrt{k}/epsilon^2) cdot Omega(log(k)/d)^{d-1}$ when $dgeq 2$. That is, for any fixed dimension $d$, our upper and lower bounds are nearly matching. Prior to our work, the sample complexity of the $d=1$ case was well-understood, but no algorithm with sub-learning sample complexity was known, even for $d=2$. Our new upper and lower bounds have interesting conceptual implications regarding the relation between learning and testing in this setting.
We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution $p$ over $n$ elements, an explicitly given distribution $q$, and parameters $
Several approaches to testing the hypothesis that two histograms are drawn from the same distribution are investigated. We note that single-sample continuous distribution tests may be adapted to this two-sample grouped data situation. The difficulty
We study the problem of a seller dynamically pricing $d$ distinct types of indivisible goods, when faced with the online arrival of unit-demand buyers drawn independently from an unknown distribution. The goods are not in limited supply, but can only
A centrally differentially private algorithm maps raw data to differentially private outputs. In contrast, a locally differentially private algorithm may only access data through public interaction with data holders, and this interaction must be a di
Let $G$ be a DAG with $n$ vertices and $m$ edges. Two vertices $u,v$ are incomparable if $u$ doesnt reach $v$ and vice versa. We denote by emph{width} of a DAG $G$, $w_G$, the maximum size of a set of incomparable vertices of $G$. In this paper we pr