ثبت أن أداء أنظمة NMT يعتمد على جودة بيانات التدريب.في هذه الورقة، نستكشف أدوات مختلفة مفتوحة المصدر التي يمكن استخدامها لتسجيل جودة أزواج الترجمة، بهدف الحصول على كورسا نظيفة لتدريب نماذج NMT.نقيس أداء هذه الأدوات من خلال ربط درجاتهم بالدرجات البشرية، وكذلك نماذج الرتبة المدربة على مجموعات البيانات التي تمت تصفيتها الناتجة من حيث أدائها في مجموعات اختبار مختلفة ومقاييس أداء MT.
Performance of NMT systems has been proven to depend on the quality of the training data. In this paper we explore different open-source tools that can be used to score the quality of translation pairs, with the goal of obtaining clean corpora for training NMT models. We measure the performance of these tools by correlating their scores with human scores, as well as rank models trained on the resulting filtered datasets in terms of their performance on different test sets and MT performance metrics.
References used
https://aclanthology.org/
Data filtering for machine translation (MT) describes the task of selecting a subset of a given, possibly noisy corpus with the aim to maximize the performance of an MT system trained on this selected data. Over the years, many different filtering ap
The explosion of user-generated content (UGC)---e.g. social media posts and comments and and reviews---has motivated the development of NLP applications tailored to these types of informal texts. Prevalent among these applications have been sentiment
The concept of frequency reuse has been successfully implemented in modern cellular communications systems in order to increase the system capacity. Further improvement of capacity can be achieved by employing adaptive arrays at the base station. In
We propose a shared task on training instance selection for few-shot neural text generation. Large-scale pretrained language models have led to dramatic improvements in few-shot text generation. Nonetheless, almost all previous work simply applies ra
There are two approaches for pairwise sentence scoring: Cross-encoders, which perform full-attention over the input pair, and Bi-encoders, which map each input independently to a dense vector space. While cross-encoders often achieve higher performan