Do you want to publish a course? Click here

Minimizing the expected value of the asymmetric loss and an inequality of the variance of the loss

57   0   0.0 ( 0 )
 Added by Naoya Yamaguchi
 Publication date 2019
  fields
and research's language is English




Ask ChatGPT about the research

For some estimations and predictions, we solve minimization problems with asymmetric loss functions. Usually, we estimate the coefficient of regression for these problems. In this paper, we do not make such the estimation, but rather give a solution by correcting any predictions so that the prediction error follows a general normal distribution. In our method, we can not only minimize the expected value of the asymmetric loss, but also lower the variance of the loss.



rate research

Read More

Data augmentation is an effective technique to improve the generalization of deep neural networks. However, previous data augmentation methods usually treat the augmented samples equally without considering their individual impacts on the model. To address this, for the augmented samples from the same training example, we propose to assign different weights to them. We construct the maximal expected loss which is the supremum over any reweighted loss on augmented samples. Inspired by adversarial training, we minimize this maximal expected loss (MMEL) and obtain a simple and interpretable closed-form solution: more attention should be paid to augmented samples with large loss values (i.e., harder examples). Minimizing this maximal expected loss enables the model to perform well under any reweighting strategy. The proposed method can generally be applied on top of any data augmentation methods. Experiments are conducted on both natural language understanding tasks with token-level data augmentation, and image classification tasks with commonly-used image augmentation techniques like random crop and horizontal flip. Empirical results show that the proposed method improves the generalization performance of the model.
We investigate predictive density estimation under the $L^2$ Wasserstein loss for location families and location-scale families. We show that plug-in densities form a complete class and that the Bayesian predictive density is given by the plug-in density with the posterior mean of the location and scale parameters. We provide Bayesian predictive densities that dominate the best equivariant one in normal models.
We give a short proof of a recently established Hardy-type inequality due to Keller, Pinchover, and Pogorzelski together with its optimality. Moreover, we identify the remainder term which makes it into an identity.
Recently, the Wasserstein loss function has been proven to be effective when applied to deterministic full-waveform inversion (FWI) problems. We consider the application of this loss function in Bayesian FWI so that the uncertainty can be captured in the solution. Other loss functions that are commonly used in practice are also considered for comparison. Existence and stability of the resulting Gibbs posteriors are shown on function space under weak assumptions on the prior and model. In particular, the distribution arising from the Wasserstein loss is shown to be quite stable with respect to high-frequency noise in the data. We then illustrate the difference between the resulting distributions numerically, using Laplace approximations to estimate the unknown velocity field and uncertainty associated with the estimates.
We study learning named entity recognizers in the presence of missing entity annotations. We approach this setting as tagging with latent variables and propose a novel loss, the Expected Entity Ratio, to learn models in the presence of systematically missing tags. We show that our approach is both theoretically sound and empirically useful. Experimentally, we find that it meets or exceeds performance of strong and state-of-the-art baselines across a variety of languages, annotation scenarios, and amounts of labeled data. In particular, we find that it significantly outperforms the previous state-of-the-art methods from Mayhew et al. (2019) and Li et al. (2021) by +12.7 and +2.3 F1 score in a challenging setting with only 1,000 biased annotations, averaged across 7 datasets. We also show that, when combined with our approach, a novel sparse annotation scheme outperforms exhaustive annotation for modest annotation budgets.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا