بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Effective Unsupervised Domain Adaptation with Adversarially Trained Language Models

124 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Thuy-Trang Vu

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Thuy-Trang Vu - Dinh Phung - Gholamreza Haffari

الحساب واللغة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Recent work has shown the importance of adaptation of broad-coverage contextualised embedding models on the domain of the target task of interest. Current self-supervised adaptation methods are simplistic, as the training signal comes from a small percentage of emph{randomly} masked-out tokens. In this paper, we show that careful masking strategies can bridge the knowledge gap of masked language models (MLMs) about the domains more effectively by allocating self-supervision where it is needed. Furthermore, we propose an effective training strategy by adversarially masking out those tokens which are harder to reconstruct by the underlying MLM. The adversarial objective leads to a challenging combinatorial optimisation problem over emph{subsets} of tokens, which we tackle efficiently through relaxation to a variational lowerbound and dynamic programming. On six unsupervised domain adaptation tasks involving named entity recognition, our method strongly outperforms the random masking strategy and achieves up to +1.64 F1 score improvements.

قيم البحث

168 - Kosuke Nishida , Kyosuke Nishida , Itsumi Saito 2019

This study tackles unsupervised domain adaptation of reading comprehension (UDARC). Reading comprehension (RC) is a task to learn the capability for question answering with textual sources. State-of-the-art models on RC still do not have general ling uistic intelligence; i.e., their accuracy worsens for out-domain datasets that are not used in the training. We hypothesize that this discrepancy is caused by a lack of the language modeling (LM) capability for the out-domain. The UDARC task allows models to use supervised RC training data in the source domain and only unlabeled passages in the target domain. To solve the UDARC problem, we provide two domain adaptation models. The first one learns the out-domain LM and in-domain RC task sequentially. The second one is the proposed model that uses a multi-task learning approach of LM and RC. The models can retain both the RC capability acquired from the supervised data in the source domain and the LM capability from the unlabeled data in the target domain. We evaluated the models on UDARC with five datasets in different domains. The models outperformed the model without domain adaptation. In particular, the proposed model yielded an improvement of 4.3/4.2 points in EM/F1 in an unseen biomedical domain.

الحساب واللغة

UDALM: Unsupervised Domain Adaptation through Language Modeling

175 - Constantinos Karouzos , Georgios Paraskevopoulos , Alexandrosn Potamianos 2021

In this work we explore Unsupervised Domain Adaptation (UDA) of pretrained language models for downstream tasks. We introduce UDALM, a fine-tuning procedure, using a mixed classification and Masked Language Model loss, that can adapt to the target do main distribution in a robust and sample efficient manner. Our experiments show that performance of models trained with the mixed loss scales with the amount of available target data and the mixed loss can be effectively used as a stopping criterion during UDA training. Furthermore, we discuss the relationship between A-distance and the target error and explore some limitations of the Domain Adversarial Training approach. Our method is evaluated on twelve domain pairs of the Amazon Reviews Sentiment dataset, yielding $91.74%$ accuracy, which is an $1.11%$ absolute improvement over the state-of-the-art.

الحساب واللغة

Adversarially Trained Models with Test-Time Covariate Shift Adaptation

102 - Jay Nandy , Sudipan Saha , Wynne Hsu 2021

We empirically demonstrate that test-time adaptive batch normalization, which re-estimates the batch-normalization statistics during inference, can provide $ell_2$-certification as well as improve the commonly occurring corruption robustness of adver sarially trained models while maintaining their state-of-the-art empirical robustness against adversarial attacks. Furthermore, we obtain similar $ell_2$-certification as the current state-of-the-art certification models for CIFAR-10 by learning our adversarially trained model using larger $ell_2$-bounded adversaries. Therefore our work is a step towards bridging the gap between the state-of-the-art certification and empirical robustness. Our results also indicate that improving the empirical adversarial robustness may be sufficient as we achieve certification and corruption robustness as a by-product using test-time adaptive batch normalization.

التعلم الآلي الذكاء الاصطناعي

Unsupervised Paraphrase Generation using Pre-trained Language Models

161 - Chaitra Hegde , Shrikumar Patil 2020

Large scale Pre-trained Language Models have proven to be very powerful approach in various Natural language tasks. OpenAIs GPT-2 cite{radford2019language} is notable for its capability to generate fluent, well formulated, grammatically consistent te xt and for phrase completions. In this paper we leverage this generation capability of GPT-2 to generate paraphrases without any supervision from labelled data. We examine how the results compare with other supervised and unsupervised approaches and the effect of using paraphrases for data augmentation on downstream tasks such as classification. Our experiments show that paraphrases generated with our model are of good quality, are diverse and improves the downstream task performance when used for data augmentation.

الحساب واللغة التعلم الآلي

Unsupervised Domain Clusters in Pretrained Language Models

179 - Roee Aharoni , Yoav Goldberg 2020

The notion of in-domain data in NLP is often over-simplistic and vague, as textual data varies in many nuanced linguistic aspects such as topic, style or level of formality. In addition, domain labels are many times unavailable, making it challenging to build domain-specific systems. We show that massive pre-trained language models implicitly learn sentence representations that cluster by domains without supervision -- suggesting a simple data-driven definition of domains in textual data. We harness this property and propose domain data selection methods based on such models, which require only a small set of in-domain monolingual data. We evaluate our data selection methods for neural machine translation across five diverse domains, where they outperform an established approach as measured by both BLEU and by precision and recall of sentence selection with respect to an oracle.

الحساب واللغة

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة قرطبة الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Effective Unsupervised Domain Adaptation with Adversarially Trained Language Models

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً