ترغب بنشر مسار تعليمي؟ اضغط هنا

Midpoint Regularization: from High Uncertainty Training to Conservative Classification

103   0   0.0 ( 0 )
 نشر من قبل Hongyu Guo
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English
 تأليف Hongyu Guo




اسأل ChatGPT حول البحث

Label Smoothing (LS) improves model generalization through penalizing models from generating overconfident output distributions. For each training sample the LS strategy smooths the one-hot encoded training signal by distributing its distribution mass over the non-ground truth classes. We extend this technique by considering example pairs, coined PLS. PLS first creates midpoint samples by averaging random sample pairs and then learns a smoothing distribution during training for each of these midpoint samples, resulting in midpoints with high uncertainty labels for training. We empirically show that PLS significantly outperforms LS, achieving up to 30% of relative classification error reduction. We also visualize that PLS produces very low winning softmax scores for both in and out of distribution samples.



قيم البحث

اقرأ أيضاً

Traditional deep neural nets (NNs) have shown the state-of-the-art performance in the task of classification in various applications. However, NNs have not considered any types of uncertainty associated with the class probabilities to minimize risk d ue to misclassification under uncertainty in real life. Unlike Bayesian neural nets indirectly infering uncertainty through weight uncertainties, evidential neural networks (ENNs) have been recently proposed to support explicit modeling of the uncertainty of class probabilities. It treats predictions of an NN as subjective opinions and learns the function by collecting the evidence leading to these opinions by a deterministic NN from data. However, an ENN is trained as a black box without explicitly considering different types of inherent data uncertainty, such as vacuity (uncertainty due to a lack of evidence) or dissonance (uncertainty due to conflicting evidence). This paper presents a new approach, called a {em regularized ENN}, that learns an ENN based on regularizations related to different characteristics of inherent data uncertainty. Via the experiments with both synthetic and real-world datasets, we demonstrate that the proposed regularized ENN can better learn of an ENN modeling different types of uncertainty in the class probabilities for classification tasks.
Lifelong learning capabilities are crucial for sentiment classifiers to process continuous streams of opinioned information on the Web. However, performing lifelong learning is non-trivial for deep neural networks as continually training of increment ally available information inevitably results in catastrophic forgetting or interference. In this paper, we propose a novel iterative network pruning with uncertainty regularization method for lifelong sentiment classification (IPRLS), which leverages the principles of network pruning and weight regularization. By performing network pruning with uncertainty regularization in an iterative manner, IPRLS can adapta single BERT model to work with continuously arriving data from multiple domains while avoiding catastrophic forgetting and interference. Specifically, we leverage an iterative pruning method to remove redundant parameters in large deep networks so that the freed-up space can then be employed to learn new tasks, tackling the catastrophic forgetting problem. Instead of keeping the old-tasks fixed when learning new tasks, we also use an uncertainty regularization based on the Bayesian online learning framework to constrain the update of old tasks weights in BERT, which enables positive backward transfer, i.e. learning new tasks improves performance on past tasks while protecting old knowledge from being lost. In addition, we propose a task-specific low-dimensional residual function in parallel to each layer of BERT, which makes IPRLS less prone to losing the knowledge saved in the base BERT network when learning a new task. Extensive experiments on 16 popular review corpora demonstrate that the proposed IPRLS method sig-nificantly outperforms the strong baselines for lifelong sentiment classification. For reproducibility, we submit the code and data at:https://github.com/siat-nlp/IPRLS.
Humans are experts at high-fidelity imitation -- closely mimicking a demonstration, often in one attempt. Humans use this ability to quickly solve a task instance, and to bootstrap learning of new tasks. Achieving these abilities in autonomous agents is an open problem. In this paper, we introduce an off-policy RL algorithm (MetaMimic) to narrow this gap. MetaMimic can learn both (i) policies for high-fidelity one-shot imitation of diverse novel skills, and (ii) policies that enable the agent to solve tasks more efficiently than the demonstrators. MetaMimic relies on the principle of storing all experiences in a memory and replaying these to learn massive deep neural network policies by off-policy RL. This paper introduces, to the best of our knowledge, the largest existing neural networks for deep RL and shows that larger networks with normalization are needed to achieve one-shot high-fidelity imitation on a challenging manipulation task. The results also show that both types of policy can be learned from vision, in spite of the task rewards being sparse, and without access to demonstrator actions.
106 - Han-Jia Ye , Wei-Lun Chao 2021
Model-agnostic meta-learning (MAML) is arguably the most popular meta-learning algorithm nowadays, given its flexibility to incorporate various model architectures and to be applied to different problems. Nevertheless, its performance on few-shot cla ssification is far behind many recent algorithms dedicated to the problem. In this paper, we point out several key facets of how to train MAML to excel in few-shot classification. First, we find that a large number of gradient steps are needed for the inner loop update, which contradicts the common usage of MAML for few-shot classification. Second, we find that MAML is sensitive to the permutation of class assignments in meta-testing: for a few-shot task of $N$ classes, there are exponentially many ways to assign the learned initialization of the $N$-way classifier to the $N$ classes, leading to an unavoidably huge variance. Third, we investigate several ways for permutation invariance and find that learning a shared classifier initialization for all the classes performs the best. On benchmark datasets such as MiniImageNet and TieredImageNet, our approach, which we name UNICORN-MAML, performs on a par with or even outperforms state-of-the-art algorithms, while keeping the simplicity of MAML without adding any extra sub-networks.
We present a novel variant of Domain Adversarial Networks with impactful improvements to the loss functions, training paradigm, and hyperparameter optimization. New loss functions are defined for both forks of the DANN network, the label predictor an d domain classifier, in order to facilitate more rapid gradient descent, provide more seamless integration into modern neural networking frameworks, and allow previously unavailable inferences into network behavior. Using these loss functions, it is possible to extend the concept of domain to include arbitrary user defined labels applicable to subsets of the training data, the test data, or both. As such, the network can be operated in either On the Fly mode where features provided by the feature extractor indicative of differences between domain labels in the training data are removed or in Test Collection Informed mode where features indicative of difference between domain labels in the combined training and test data are removed (without needing to know or provide test activity labels to the network). This work also draws heavily from previous works on Robust Training which draws training examples from a L_inf ball around the training data in order to remove fragile features induced by random fluctuations in the data. On these networks we explore the process of hyperparameter optimization for both the domain adversarial and robust hyperparameters. Finally, this network is applied to the construction of a binary classifier used to identify the presence of EM signal emitted by a turbopump. For this example, the effect of the robust and domain adversarial training is to remove features indicative of the difference in background between instances of operation of the device - providing highly discriminative features on which to construct the classifier.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا