Do you want to publish a course? Click here

Noise Stability Regularization for Improving BERT Fine-tuning

تنظيم استقرار الضوضاء لتحسين Bert Nine-Tuning

364   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

Fine-tuning pre-trained language models suchas BERT has become a common practice dom-inating leaderboards across various NLP tasks.Despite its recent success and wide adoption,this process is unstable when there are onlya small number of training samples available.The brittleness of this process is often reflectedby the sensitivity to random seeds. In this pa-per, we propose to tackle this problem basedon the noise stability property of deep nets,which is investigated in recent literature (Aroraet al., 2018; Sanyal et al., 2020). Specifically,we introduce a novel and effective regulariza-tion method to improve fine-tuning on NLPtasks, referred to asLayer-wiseNoiseStabilityRegularization (LNSR). We extend the theo-ries about adding noise to the input and provethat our method gives a stabler regularizationeffect. We provide supportive evidence by ex-perimentally confirming that well-performingmodels show a low sensitivity to noise andfine-tuning with LNSR exhibits clearly bet-ter generalizability and stability. Furthermore,our method also demonstrates advantages overother state-of-the-art algorithms including L2-SP (Li et al., 2018), Mixout (Lee et al., 2020)and SMART (Jiang et al., 20)



References used
https://aclanthology.org/
rate research

Read More

In this paper, we describe our system used for SemEval 2021 Task 7: HaHackathon: Detecting and Rating Humor and Offense. We used a simple fine-tuning approach using different Pre-trained Language Models (PLMs) to evaluate their performance for humor and offense detection. For regression tasks, we averaged the scores of different models leading to better performance than the original models. We participated in all SubTasks. Our best performing system was ranked 4 in SubTask 1-b, 8 in SubTask 1-c, 12 in SubTask 2, and performed well in SubTask 1-a. We further show comprehensive results using different pre-trained language models which will help as baselines for future work.
We address the problem of enhancing model robustness through regularization. Specifically, we focus on methods that regularize the model posterior difference between clean and noisy inputs. Theoretically, we provide a connection of two recent methods , Jacobian Regularization and Virtual Adversarial Training, under this framework. Additionally, we generalize the posterior differential regularization to the family of f-divergences and characterize the overall framework in terms of the Jacobian matrix. Empirically, we compare those regularizations and standard BERT training on a diverse set of tasks to provide a comprehensive profile of their effect on model generalization. For both fully supervised and semi-supervised settings, we show that regularizing the posterior difference with f-divergence can result in well-improved model robustness. In particular, with a proper f-divergence, a BERT-base model can achieve comparable generalization as its BERT-large counterpart for in-domain, adversarial and domain shift scenarios, indicating the great potential of the proposed framework for enhancing NLP model robustness.
Distantly supervised models are very popular for relation extraction since we can obtain a large amount of training data using the distant supervision method without human annotation. In distant supervision, a sentence is considered as a source of a tuple if the sentence contains both entities of the tuple. However, this condition is too permissive and does not guarantee the presence of relevant relation-specific information in the sentence. As such, distantly supervised training data contains much noise which adversely affects the performance of the models. In this paper, we propose a self-ensemble filtering mechanism to filter out the noisy samples during the training process. We evaluate our proposed framework on the New York Times dataset which is obtained via distant supervision. Our experiments with multiple state-of-the-art neural relation extraction models show that our proposed filtering mechanism improves the robustness of the models and increases their F1 scores.
This paper presents the DuluthNLP submission to Task 7 of the SemEval 2021 competition on Detecting and Rating Humor and Offense. In it, we explain the approach used to train the model together with the process of fine-tuning our model in getting the results. We focus on humor detection, rating, and of-fense rating, representing three out of the four subtasks that were provided. We show that optimizing hyper-parameters for learning rate, batch size and number of epochs can increase the accuracy and F1 score for humor detection
Neural topic models can augment or replace bag-of-words inputs with the learned representations of deep pre-trained transformer-based word prediction models. One added benefit when using representations from multilingual models is that they facilitat e zero-shot polylingual topic modeling. However, while it has been widely observed that pre-trained embeddings should be fine-tuned to a given task, it is not immediately clear what supervision should look like for an unsupervised task such as topic modeling. Thus, we propose several methods for fine-tuning encoders to improve both monolingual and zero-shot polylingual neural topic modeling. We consider fine-tuning on auxiliary tasks, constructing a new topic classification task, integrating the topic classification objective directly into topic model training, and continued pre-training. We find that fine-tuning encoder representations on topic classification and integrating the topic classification task directly into topic modeling improves topic quality, and that fine-tuning encoder representations on any task is the most important factor for facilitating cross-lingual transfer.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا