ترغب بنشر مسار تعليمي؟ اضغط هنا

Stronger Calibration Lower Bounds via Sidestepping

341   0   0.0 ( 0 )
 نشر من قبل Mingda Qiao
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We consider an online binary prediction setting where a forecaster observes a sequence of $T$ bits one by one. Before each bit is revealed, the forecaster predicts the probability that the bit is $1$. The forecaster is called well-calibrated if for each $p in [0, 1]$, among the $n_p$ bits for which the forecaster predicts probability $p$, the actual number of ones, $m_p$, is indeed equal to $p cdot n_p$. The calibration error, defined as $sum_p |m_p - p n_p|$, quantifies the extent to which the forecaster deviates from being well-calibrated. It has long been known that an $O(T^{2/3})$ calibration error is achievable even when the bits are chosen adversarially, and possibly based on the previous predictions. However, little is known on the lower bound side, except an $Omega(sqrt{T})$ bound that follows from the trivial example of independent fair coin flips. In this paper, we prove an $Omega(T^{0.528})$ bound on the calibration error, which is the first super-$sqrt{T}$ lower bound for this setting to the best of our knowledge. The technical contributions of our work include two lower bound techniques, early stopping and sidestepping, which circumvent the obstacles that have previously hindered strong calibration lower bounds. We also propose an abstraction of the prediction setting, termed the Sign-Preservation game, which may be of independent interest. This game has a much smaller state space than the full prediction setting and allows simpler analyses. The $Omega(T^{0.528})$ lower bound follows from a general reduction theorem that translates lower bounds on the game value of Sign-Preservation into lower bounds on the calibration error.



قيم البحث

اقرأ أيضاً

We prove a emph{query complexity} lower bound for approximating the top $r$ dimensional eigenspace of a matrix. We consider an oracle model where, given a symmetric matrix $mathbf{M} in mathbb{R}^{d times d}$, an algorithm $mathsf{Alg}$ is allowed to make $mathsf{T}$ exact queries of the form $mathsf{w}^{(i)} = mathbf{M} mathsf{v}^{(i)}$ for $i$ in ${1,...,mathsf{T}}$, where $mathsf{v}^{(i)}$ is drawn from a distribution which depends arbitrarily on the past queries and measurements ${mathsf{v}^{(j)},mathsf{w}^{(i)}}_{1 le j le i-1}$. We show that for every $mathtt{gap} in (0,1/2]$, there exists a distribution over matrices $mathbf{M}$ for which 1) $mathrm{gap}_r(mathbf{M}) = Omega(mathtt{gap})$ (where $mathrm{gap}_r(mathbf{M})$ is the normalized gap between the $r$ and $r+1$-st largest-magnitude eigenvector of $mathbf{M}$), and 2) any algorithm $mathsf{Alg}$ which takes fewer than $mathrm{const} times frac{r log d}{sqrt{mathtt{gap}}}$ queries fails (with overwhelming probability) to identity a matrix $widehat{mathsf{V}} in mathbb{R}^{d times r}$ with orthonormal columns for which $langle widehat{mathsf{V}}, mathbf{M} widehat{mathsf{V}}rangle ge (1 - mathrm{const} times mathtt{gap})sum_{i=1}^r lambda_i(mathbf{M})$. Our bound requires only that $d$ is a small polynomial in $1/mathtt{gap}$ and $r$, and matches the upper bounds of Musco and Musco 15. Moreover, it establishes a strict separation between convex optimization and emph{randomized}, strict-saddle non-convex optimization of which PCA is a canonical example: in the former, first-order methods can have dimension-free iteration complexity, whereas in PCA, the iteration complexity of gradient-based methods must necessarily grow with the dimension.
This paper proves strong lower bounds for distributed computing in the CONGEST model, by presenting the bit-gadget: a new technique for constructing graphs with small cuts. The contribution of bit-gadgets is twofold. First, developing careful spars e graph constructions with small cuts extends known techniques to show a near-linear lower bound for computing the diameter, a result previously known only for dense graphs. Moreover, the sparseness of the construction plays a crucial role in applying it to approximations of various distance computation problems, drastically improving over what can be obtained when using dense graphs. Second, small cuts are essential for proving super-linear lower bounds, none of which were known prior to this work. In fact, they allow us to show near-quadratic lower bounds for several problems, such as exact minimum vertex cover or maximum independent set, as well as for coloring a graph with its chromatic number. Such strong lower bounds are not limited to NP-hard problems, as given by two simple graph problems in P which are shown to require a quadratic and near-quadratic number of rounds. All of the above are optimal up to logarithmic factors. In addition, in this context, the complexity of the all-pairs-shortest-paths problem is discussed. Finally, it is shown that graph constructions for CONGEST lower bounds translate to lower bounds for the semi-streaming model, despite being very different in its nature.
We study the problem of PAC learning one-hidden-layer ReLU networks with $k$ hidden units on $mathbb{R}^d$ under Gaussian marginals in the presence of additive label noise. For the case of positive coefficients, we give the first polynomial-time algo rithm for this learning problem for $k$ up to $tilde{O}(sqrt{log d})$. Previously, no polynomial time algorithm was known, even for $k=3$. This answers an open question posed by~cite{Kliv17}. Importantly, our algorithm does not require any assumptions about the rank of the weight matrix and its complexity is independent of its condition number. On the negative side, for the more general task of PAC learning one-hidden-layer ReLU networks with arbitrary real coefficients, we prove a Statistical Query lower bound of $d^{Omega(k)}$. Thus, we provide a separation between the two classes in terms of efficient learnability. Our upper and lower bounds are general, extending to broader families of activation functions.
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals. In the former problem, given labeled examples $(mathbf{x}, y)$ from an unknown distribution on $mathbb{R}^d times { pm 1}$, whose marginal distr ibution on $mathbf{x}$ is the standard Gaussian and the labels $y$ can be arbitrary, the goal is to output a hypothesis with 0-1 loss $mathrm{OPT}+epsilon$, where $mathrm{OPT}$ is the 0-1 loss of the best-fitting halfspace. In the latter problem, given labeled examples $(mathbf{x}, y)$ from an unknown distribution on $mathbb{R}^d times mathbb{R}$, whose marginal distribution on $mathbf{x}$ is the standard Gaussian and the labels $y$ can be arbitrary, the goal is to output a hypothesis with square loss $mathrm{OPT}+epsilon$, where $mathrm{OPT}$ is the square loss of the best-fitting ReLU. We prove Statistical Query (SQ) lower bounds of $d^{mathrm{poly}(1/epsilon)}$ for both of these problems. Our SQ lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
124 - Calvin Newport 2014
Theoreticians have studied distributed algorithms in the radio network model for close to three decades. A significant fraction of this work focuses on lower bounds for basic communication problems such as wake-up (symmetry breaking among an unknown set of nodes) and broadcast (message dissemination through an unknown network topology). In this paper, we introduce a new technique for proving this type of bound, based on reduction from a probabilistic hitting game, that simplifies and strengthens much of this existing work. In more detail, in this single paper we prove new expected time and high probability lower bounds for wake-up and global broadcast in single and multichann

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا