Research papers, master and doctoral theses about multilingual lexical normalization

Sesame Street to Mount Sinai: BERT-constrained character-level Moses models for multilingual lexical normalization

686 - Association for Computation Linguistics 2021 مقالة

This paper describes the HEL-LJU submissions to the MultiLexNorm shared task on multilingual lexical normalization. Our system is based on a BERT token classification preprocessing step, where for each token the type of the necessary transformation i s predicted (none, uppercase, lowercase, capitalize, modify), and a character-level SMT step where the text is translated from original to normalized given the BERT-predicted transformation constraints. For some languages, depending on the results on development data, the training data was extended by back-translating OpenSubtitles data. In the final ordering of the ten participating teams, the HEL-LJU team has taken the second place, scoring better than the previous state-of-the-art.

bert-constrained character-level moses multilingual lexical normalization character-level moses models بريه مقيدة مستوى الطابع موسى التطبيع المعجمي متعدد اللغات طرازات موسى مستوى الأحرف صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد