تعد تحديد الهياكل مهمة مع قابلية التطبيق في مجموعة واسعة من المجالات، تتراوح من التعرف على الكلام التلقائي إلى التعدين في الرأي.يقدم هذا العمل هياكناتنا المستخدمة في مهمة تحديد الهوية الرومانية لعام 2021.لقد أدخلنا سلسلة من الحلول بناء على المحولات الرومانية أو متعددة اللغات، فضلا عن تقنيات التدريب المشددي.في الوقت نفسه، جربنا أداة تقطير المعرفة من أجل التحقق مما إذا كان يمكن لنموذج أصغر الحفاظ على أداء أفضل أسلوبنا.تمكن أفضل الحلول لدينا للحصول على درجة F1 مرجحة من 0.7324، مما يتيح لنا الحصول على المركز الثاني على المتصدرين.
Dialect identification is a task with applicability in a vast array of domains, ranging from automatic speech recognition to opinion mining. This work presents our architectures used for the VarDial 2021 Romanian Dialect Identification subtask. We introduced a series of solutions based on Romanian or multilingual Transformers, as well as adversarial training techniques. At the same time, we experimented with a knowledge distillation tool in order to check whether a smaller model can maintain the performance of our best approach. Our best solution managed to obtain a weighted F1-score of 0.7324, allowing us to obtain the 2nd place on the leaderboard.
References used
https://aclanthology.org/
Identifying relevant knowledge to be used in conversational systems that are grounded in long documents is critical to effective response generation. We introduce a knowledge identification model that leverages the document structure to provide dialo
Transformer-based models have become the de facto standard in the field of Natural Language Processing (NLP). By leveraging large unlabeled text corpora, they enable efficient transfer learning leading to state-of-the-art results on numerous NLP task
This article describes the experiments and systems developed by the SUKI team for the second edition of the Romanian Dialect Identification (RDI) shared task which was organized as part of the 2021 VarDial Evaluation Campaign. We submitted two runs t
In this paper, we investigate the Domain Generalization (DG) problem for supervised Paraphrase Identification (PI). We observe that the performance of existing PI models deteriorates dramatically when tested in an out-of-distribution (OOD) domain. We
Curriculum learning, a machine training strategy that feeds training instances to the model from easy to hard, has been proven to facilitate the dialogue generation task. Meanwhile, knowledge distillation, a knowledge transformation methodology among