New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Preserving Cross-Linguality of Pre-trained Models via Continual Learning

الحفاظ على التقاطع بين النماذج المدربة مسبقا عبر التعلم المستمر

334 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

بيرت القائم على سيامي preserving cross-linguality continual learning الحفاظ على التقاطع التعلم المستمر صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Recently, fine-tuning pre-trained language models (e.g., multilingual BERT) to downstream cross-lingual tasks has shown promising results. However, the fine-tuning process inevitably changes the parameters of the pre-trained model and weakens its cross-lingual ability, which leads to sub-optimal performance. To alleviate this problem, we leverage continual learning to preserve the original cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks. The experimental result shows that our fine-tuning methods can better preserve the cross-lingual ability of the pre-trained model in a sentence retrieval task. Our methods also achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.

References used

https://aclanthology.org/

rate research

Multilingual Translation via Grafting Pre-trained Language Models

372 - Association for Computation Linguistics 2021 مقالة

Can pre-trained BERT for one language and GPT for another be glued together to translate texts? Self-supervised training using only monolingual data has led to the success of pre-trained (masked) language models in many NLP tasks. However, directly c onnecting BERT as an encoder and GPT as a decoder can be challenging in machine translation, for GPT-like models lack a cross-attention component that is needed in seq2seq decoders. In this paper, we propose Graformer to graft separately pre-trained (masked) language models for machine translation. With monolingual data for pre-training and parallel data for grafting training, we maximally take advantage of the usage of both types of data. Experiments on 60 directions show that our method achieves average improvements of 5.8 BLEU in x2en and 2.9 BLEU in en2x directions comparing with the multilingual Transformer of the same size.

توليد رمز المعزز grafting pre-trained language تطعيم اللغة المدربة مسبقا صناعة حمض الفوسفور

Text Detoxification using Large Pre-trained Neural Models

474 - Association for Computation Linguistics 2021 مقالة

We present two novel unsupervised methods for eliminating toxicity in text. Our first method combines two recent ideas: (1) guidance of the generation process with small style-conditional language models and (2) use of paraphrasing models to perform style transfer. We use a well-performing paraphraser guided by style-trained language models to keep the text content and remove toxicity. Our second method uses BERT to replace toxic words with their non-offensive synonyms. We make the method more flexible by enabling BERT to replace mask tokens with a variable number of words. Finally, we present the first large-scale comparative study of style transfer models on the task of toxicity removal. We compare our models with a number of methods for style transfer. The models are evaluated in a reference-free way using a combination of unsupervised style transfer metrics. Both methods we suggest yield new SOTA results.

large pre-trained neural pre-trained neural models detoxification using large كبير مدرب مسبقا النماذج العصبية المدربة مسبقا إزالة السموم باستخدام كبير صناعة حمض الفوسفور المزيد..

Exploring Strategies for Generalizable Commonsense Reasoning with Pre-trained Models

621 - Association for Computation Linguistics 2021 مقالة

Commonsense reasoning benchmarks have been largely solved by fine-tuning language models. The downside is that fine-tuning may cause models to overfit to task-specific data and thereby forget their knowledge gained during pre-training. Recent works o nly propose lightweight model updates as models may already possess useful knowledge from past experience, but a challenge remains in understanding what parts and to what extent models should be refined for a given task. In this paper, we investigate what models learn from commonsense reasoning datasets. We measure the impact of three different adaptation methods on the generalization and accuracy of models. Our experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers. We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.

generalizable commonsense reasoning strategies for generalizable exploring strategies منطق العموم المتعميم استراتيجيات القابلة للتعميم استكشاف الاستراتيجيات صناعة حمض الفوسفور المزيد..

Is BERT a Cross-Disciplinary Knowledge Learner? A Surprising Finding of Pre-trained Models' Transferability

335 - Association for Computation Linguistics 2021 مقالة

This paper investigates whether the power of the models pre-trained on text data, such as BERT, can be transferred to general token sequence classification applications. To verify pre-trained models' transferability, we test the pre-trained models on text classification tasks with meanings of tokens mismatches, and real-world non-text token sequence classification data, including amino acid, DNA, and music. We find that even on non-text data, the models pre-trained on text converge faster, perform better than the randomly initialized models, and only slightly worse than the models using task-specific knowledge. We also find that the representations of the text and non-text pre-trained models share non-trivial similarities.

cross-disciplinary knowledge learner pre-trained models' transferability knowledge learner المتعلم المعرفي متعدد التخصصات نماذج النماذج المدربة مسبقا متعلم المعرفة صناعة حمض الفوسفور المزيد..

Continual Learning in Multilingual NMT via Language-Specific Embeddings

518 - Association for Computation Linguistics 2021 مقالة

This paper proposes a technique for adding a new source or target language to an existing multilingual NMT model without re-training it on the initial set of languages. It consists in replacing the shared vocabulary with a small language-specific voc abulary and fine-tuning the new embeddings on the new language's parallel data. Some additional language-specific components may be trained to improve performance (e.g., Transformer layers or adapter modules). Because the parameters of the original model are not modified, its performance on the initial languages does not degrade. We show on two sets of experiments (small-scale on TED Talks, and large-scale on ParaCrawl) that this approach performs as well or better as the more costly alternatives; and that it has excellent zero-shot performance: training on English-centric data is enough to translate between the new language and any of the initial languages.

existing multilingual nmt multilingual nmt model موجودة متعددة اللغات NMT. نموذج NMT متعدد اللغات صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Preserving Cross-Linguality of Pre-trained Models via Continual Learning

الحفاظ على التقاطع بين النماذج المدربة مسبقا عبر التعلم المستمر

Ask ChatGPT about the research

Read More

suggested questions