ترغب بنشر مسار تعليمي؟ اضغط هنا

Statistical Inference for Bayesian Risk Minimization via Exponentially Tilted Empirical Likelihood

230   0   0.0 ( 0 )
 نشر من قبل Rong Tang
 تاريخ النشر 2021
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

The celebrated Bernstein von-Mises theorem ensures that credible regions from Bayesian posterior are well-calibrated when the model is correctly-specified, in the frequentist sense that their coverage probabilities tend to the nominal values as data accrue. However, this conventional Bayesian framework is known to lack robustness when the model is misspecified or only partly specified, such as in quantile regression, risk minimization based supervised/unsupervised learning and robust estimation. To overcome this difficulty, we propose a new Bayesian inferential approach that substitutes the (misspecified or partly specified) likelihoods with proper exponentially tilted empirical likelihoods plus a regularization term. Our surrogate empirical likelihood is carefully constructed by using the first order optimality condition of the empirical risk minimization as the moment condition. We show that the Bayesian posterior obtained by combining this surrogate empirical likelihood and the prior is asymptotically close to a normal distribution centering at the empirical risk minimizer with covariance matrix taking an appropriate sandwiched form. Consequently, the resulting Bayesian credible regions are automatically calibrated to deliver valid uncertainty quantification. Computationally, the proposed method can be easily implemented by Markov Chain Monte Carlo sampling algorithms. Our numerical results show that the proposed method tends to be more accurate than existing state-of-the-art competitors.



قيم البحث

اقرأ أيضاً

85 - Shujie Ma , Mingming Liu 2021
We consider a sparse deep ReLU network (SDRN) estimator obtained from empirical risk minimization with a Lipschitz loss function in the presence of a large number of features. Our framework can be applied to a variety of regression and classification problems. The unknown target function to estimate is assumed to be in a Korobov space. Functions in this space only need to satisfy a smoothness condition rather than having a compositional structure. We develop non-asymptotic excess risk bounds for our SDRN estimator. We further derive that the SDRN estimator can achieve the same minimax rate of estimation (up to logarithmic factors) as one-dimensional nonparametric regression when the dimension of the features is fixed, and the estimator has a suboptimal rate when the dimension grows with the sample size. We show that the depth and the total number of nodes and weights of the ReLU network need to grow as the sample size increases to ensure a good performance, and also investigate how fast they should increase with the sample size. These results provide an important theoretical guidance and basis for empirical studies by deep neural networks.
High-dimensional statistical inference with general estimating equations are challenging and remain less explored. In this paper, we study two problems in the area: confidence set estimation for multiple components of the model parameters, and model specifications test. For the first one, we propose to construct a new set of estimating equations such that the impact from estimating the high-dimensional nuisance parameters becomes asymptotically negligible. The new construction enables us to estimate a valid confidence region by empirical likelihood ratio. For the second one, we propose a test statistic as the maximum of the marginal empirical likelihood ratios to quantify data evidence against the model specification. Our theory establishes the validity of the proposed empirical likelihood approaches, accommodating over-identification and exponentially growing data dimensionality. The numerical studies demonstrate promising performance and potential practical benefits of the new methods.
229 - K. L. Mengersen 2012
Approximate Bayesian computation (ABC) has become an essential tool for the analysis of complex stochastic models when the likelihood function is numerically unavailable. However, the well-established statistical method of empirical likelihood provid es another route to such settings that bypasses simulations from the model and the choices of the ABC parameters (summary statistics, distance, tolerance), while being convergent in the number of observations. Furthermore, bypassing model simulations may lead to significant time savings in complex models, for instance those found in population genetics. The BCel algorithm we develop in this paper also provides an evaluation of its own performance through an associated effective sample size. The method is illustrated using several examples, including estimation of standard distributions, time series, and population genetics models.
Privacy-preserving machine learning algorithms are crucial for the increasingly common setting in which personal data, such as medical or financial records, are analyzed. We provide general techniques to produce privacy-preserving approximations of c lassifiers learned via (regularized) empirical risk minimization (ERM). These algorithms are private under the $epsilon$-differential privacy definition due to Dwork et al. (2006). First we apply the output perturbation ideas of Dwork et al. (2006), to ERM classification. Then we propose a new method, objective perturbation, for privacy-preserving machine learning algorithm design. This method entails perturbing the objective function before optimizing over classifiers. If the loss and regularizer satisfy certain convexity and differentiability criteria, we prove theoretical results showing that our algorithms preserve privacy, and provide generalization bounds for linear and nonlinear kernels. We further present a privacy-preserving technique for tuning the parameters in general machine learning algorithms, thereby providing end-to-end privacy guarantees for the training process. We apply these results to produce privacy-preserving analogues of regularized logistic regression and support vector machines. We obtain encouraging results from evaluating their performance on real demographic and benchmark data sets. Our results show that both theoretically and empirically, objective perturbation is superior to the previous state-of-the-art, output perturbation, in managing the inherent tradeoff between privacy and learning performance.
Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. Our experiments on the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neural network architectures. We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا