ترغب بنشر مسار تعليمي؟ اضغط هنا

PAC Mode Estimation using PPR Martingale Confidence Sequences

203   0   0.0 ( 0 )
 نشر من قبل Shubham Anand Jain
 تاريخ النشر 2021
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

We consider the problem of correctly identifying the mode of a discrete distribution $mathcal{P}$ with sufficiently high probability by observing a sequence of i.i.d. samples drawn according to $mathcal{P}$. This problem reduces to the estimation of a single parameter when $mathcal{P}$ has a support set of size $K = 2$. Noting the efficiency of prior-posterior-ratio (PPR) martingale confidence sequences for handling this special case, we propose a generalisation to mode estimation, in which $mathcal{P}$ may take $K geq 2$ values. We observe that the one-versus-one principle yields a more efficient generalisation than the one-versus-rest alternative. Our resulting stopping rule, denoted PPR-ME, is optimal in its sample complexity up to a logarithmic factor. Moreover, PPR-ME empirically outperforms several other competing approaches for mode estimation. We demonstrate the gains offered by PPR-ME in two practical applications: (1) sample-based forecasting of the winner in indirect election systems, and (2) efficient verification of smart contracts in permissionless blockchains.



قيم البحث

اقرأ أيضاً

The prior distribution on parameters of a likelihood is the usual starting point for Bayesian uncertainty quantification. In this paper, we present a different perspective. Given a finite data sample $Y_{1:n}$ of size $n$ from an infinite population, we focus on the missing $Y_{n+1:infty}$ as the source of statistical uncertainty, with the parameter of interest being known precisely given $Y_{1:infty}$. We argue that the foundation of Bayesian inference is to assign a predictive distribution on $Y_{n+1:infty}$ conditional on $Y_{1:n}$, which then induces a distribution on the parameter of interest. Demonstrating an application of martingales, Doob shows that choosing the Bayesian predictive distribution returns the conventional posterior as the distribution of the parameter. Taking this as our cue, we relax the predictive machine, avoiding the need for the predictive to be derived solely from the usual prior to posterior to predictive density formula. We introduce the martingale posterior distribution, which returns Bayesian uncertainty directly on any statistic of interest without the need for the likelihood and prior, and this distribution can be sampled through a computational scheme we name predictive resampling. To that end, we introduce new predictive methodologies for multivariate density estimation, regression and classification that build upon recent work on bivariate copulas.
We develop confidence bounds that hold uniformly over time for off-policy evaluation in the contextual bandit setting. These confidence sequences are based on recent ideas from martingale analysis and are non-asymptotic, non-parametric, and valid at arbitrary stopping times. We provide algorithms for computing these confidence sequences that strike a good balance between computational and statistical efficiency. We empirically demonstrate the tightness of our approach in terms of failure probability and width and apply it to the gated deployment problem of safely upgrading a production contextual bandit system.
Consider a two-by-two factorial experiment with more than 1 replicate. Suppose that we have uncertain prior information that the two-factor interaction is zero. We describe new simultaneous frequentist confidence intervals for the 4 population cell m eans, with simultaneous confidence coefficient 1-alpha, that utilize this prior information in the following sense. These simultaneous confidence intervals define a cube with expected volume that (a) is relatively small when the two-factor interaction is zero and (b) has maximum value that is not too large. Also, these intervals coincide with the standard simultaneous confidence intervals obtained by Tukeys method, with simultaneous confidence coefficient 1-alpha, when the data strongly contradict the prior information that the two-factor interaction is zero. We illustrate the application of these new simultaneous confidence intervals to a real data set.
273 - Aiyou Chen , Jin Cao , 2007
The statistical problem for network tomography is to infer the distribution of $mathbf{X}$, with mutually independent components, from a measurement model $mathbf{Y}=Amathbf{X}$, where $A$ is a given binary matrix representing the routing topology of a network under consideration. The challenge is that the dimension of $mathbf{X}$ is much larger than that of $mathbf{Y}$ and thus the problem is often called ill-posed. This paper studies some statistical aspects of network tomography. We first address the identifiability issue and prove that the $mathbf{X}$ distribution is identifiable up to a shift parameter under mild conditions. We then use a mixture model of characteristic functions to derive a fast algorithm for estimating the distribution of $mathbf{X}$ based on the General method of Moments. Through extensive model simulation and real Internet trace driven simulation, the proposed approach is shown to be favorable comparing to previous methods using simple discretization for inferring link delays in a heterogeneous network.
87 - Ben Boukai , Yue Zhang 2018
We consider a resampling scheme for parameters estimates in nonlinear regression models. We provide an estimation procedure which recycles, via random weighting, the relevant parameters estimates to construct consistent estimates of the sampling dist ribution of the various estimates. We establish the asymptotic normality of the resampled estimates and demonstrate the applicability of the recycling approach in a small simulation study and via example.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا