ﻻ يوجد ملخص باللغة العربية
All over the world and especially in Africa, researchers are putting efforts into building Neural Machine Translation (NMT) systems to help tackle the language barriers in Africa, a continent of over 2000 different languages. However, the low-resourceness, diacritical, and tonal complexities of African languages are major issues being faced. The FFR project is a major step towards creating a robust translation model from Fon, a very low-resource and tonal language, to French, for research and public use. In this paper, we introduce FFR Dataset, a corpus of Fon-to-French translations, describe the diacritical encoding process, and introduce our FFR v1.1 model, trained on the dataset. The dataset and model are made publicly available at https://github.com/ bonaventuredossou/ffr-v1, to promote collaboration and reproducibility.
Africa has the highest linguistic diversity in the world. On account of the importance of language to communication, and the importance of reliable, powerful and accurate machine translation models in modern inter-cultural communication, there have b
The encoder-decoder based neural machine translation usually generates a target sequence token by token from left to right. Due to error propagation, the tokens in the right side of the generated sequence are usually of poorer quality than those in t
We present neural machine translation (NMT) models inspired by echo state network (ESN), named Echo State NMT (ESNMT), in which the encoder and decoder layer weights are randomly generated then fixed throughout training. We show that even with this e
Building effective neural machine translation (NMT) models for very low-resourced and morphologically rich African indigenous languages is an open challenge. Besides the issue of finding available resources for them, a lot of work is put into preproc
We investigate two specific manifestations of compositionality in Neural Machine Translation (NMT) : (1) Productivity - the ability of the model to extend its predictions beyond the observed length in training data and (2) Systematicity - the ability