Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Dialect Identification through Adversarial Learning and Knowledge Distillation on Romanian BERT

تحديد الهجال من خلال التعلم المشددي وتقطير المعرفة على بيرتف روماني

1029 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Dialect identification is a task with applicability in a vast array of domains, ranging from automatic speech recognition to opinion mining. This work presents our architectures used for the VarDial 2021 Romanian Dialect Identification subtask. We introduced a series of solutions based on Romanian or multilingual Transformers, as well as adversarial training techniques. At the same time, we experimented with a knowledge distillation tool in order to check whether a smaller model can maintain the performance of our best approach. Our best solution managed to obtain a weighted F1-score of 0.7324, allowing us to obtain the 2nd place on the leaderboard.

References used

https://aclanthology.org/

rate research

DIALKI: Knowledge Identification in Conversational Systems through Dialogue-Document Contextualization

734 - Association for Computation Linguistics 2021 مقالة

Identifying relevant knowledge to be used in conversational systems that are grounded in long documents is critical to effective response generation. We introduce a knowledge identification model that leverages the document structure to provide dialo gue-contextualized passage encodings and better locate knowledge relevant to the conversation. An auxiliary loss captures the history of dialogue-document connections. We demonstrate the effectiveness of our model on two document-grounded conversational datasets and provide analyses showing generalization to unseen documents and long dialogue contexts.

conversational systems dialogue-document contextualization knowledge identification نظم المحادثة حوار بوثيقة وثيقة تحديد المعرفة صناعة حمض الفوسفور المزيد..

jurBERT: A Romanian BERT Model for Legal Judgement Prediction

772 - Association for Computation Linguistics 2021 مقالة

Transformer-based models have become the de facto standard in the field of Natural Language Processing (NLP). By leveraging large unlabeled text corpora, they enable efficient transfer learning leading to state-of-the-art results on numerous NLP task s. Nevertheless, for low resource languages and highly specialized tasks, transformer models tend to lag behind more classical approaches (e.g. SVM, LSTM) due to the lack of aforementioned corpora. In this paper we focus on the legal domain and we introduce a Romanian BERT model pre-trained on a large specialized corpus. Our model outperforms several strong baselines for legal judgement prediction on two different corpora consisting of cases from trials involving banks in Romania.

romanian bert model الرومانية بيرت نموذج صناعة حمض الفوسفور

Naive Bayes-based Experiments in Romanian Dialect Identification

609 - Association for Computation Linguistics 2021 مقالة

This article describes the experiments and systems developed by the SUKI team for the second edition of the Romanian Dialect Identification (RDI) shared task which was organized as part of the 2021 VarDial Evaluation Campaign. We submitted two runs t o the shared task and our second submission was the overall best submission by a noticeable margin. Our best submission used a character n-gram based naive Bayes classifier with adaptive language models. We describe our experiments on the development set leading to both submissions.

romanian dialect identification dialect identification romanian dialect الهوية الرومانية الهوية تحديد الهياكل لهجة رومانية صناعة حمض الفوسفور المزيد..

Towards Domain-Generalizable Paraphrase Identification by Avoiding the Shortcut Learning

950 - Association for Computation Linguistics 2021 مقالة

In this paper, we investigate the Domain Generalization (DG) problem for supervised Paraphrase Identification (PI). We observe that the performance of existing PI models deteriorates dramatically when tested in an out-of-distribution (OOD) domain. We conjecture that it is caused by shortcut learning, i.e., these models tend to utilize the cue words that are unique for a particular dataset or domain. To alleviate this issue and enhance the DG ability, we propose a PI framework based on Optimal Transport (OT). Our method forces the network to learn the necessary features for all the words in the input, which alleviates the shortcut learning problem. Experimental results show that our method improves the DG ability for the PI models.

domain-generalizable paraphrase identification supervised paraphrase identification paraphrase identification التعرف على إعادة صياغة المجال تحديد الصياغة الإشراف إعادة صياغة التعريف صناعة حمض الفوسفور المزيد..

Combining Curriculum Learning and Knowledge Distillation for Dialogue Generation

833 - Association for Computation Linguistics 2021 مقالة

Curriculum learning, a machine training strategy that feeds training instances to the model from easy to hard, has been proven to facilitate the dialogue generation task. Meanwhile, knowledge distillation, a knowledge transformation methodology among teachers and students networks can yield significant performance boost for student models. Hence, in this paper, we introduce a combination of curriculum learning and knowledge distillation for efficient dialogue generation models, where curriculum learning can help knowledge distillation from data and model aspects. To start with, from the data aspect, we cluster the training cases according to their complexity, which is calculated by various types of features such as sentence length and coherence between dialog pairs. Furthermore, we employ an adversarial training strategy to identify the complexity of cases from model level. The intuition is that, if a discriminator can tell the generated response is from the teacher or the student, then the case is difficult that the student model has not adapted to yet. Finally, we use self-paced learning, which is an extension to curriculum learning to assign weights for distillation. In conclusion, we arrange a hierarchical curriculum based on the above two aspects for the student model under the guidance from the teacher model. Experimental results demonstrate that our methods achieve improvements compared with competitive baselines.

المشاعر متعددة الوسائط combining curriculum learning الجمع بين المناهج الدراسية صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Dialect Identification through Adversarial Learning and Knowledge Distillation on Romanian BERT

تحديد الهجال من خلال التعلم المشددي وتقطير المعرفة على بيرتف روماني

Ask ChatGPT about the research

Read More

suggested questions