ترغب بنشر مسار تعليمي؟ اضغط هنا

Computer-aided translation (CAT), the use of software to assist a human translator in the translation process, has been proven to be useful in enhancing the productivity of human translators. Autocompletion, which suggests translation results accordi ng to the text pieces provided by human translators, is a core function of CAT. There are two limitations in previous research in this line. First, most research works on this topic focus on sentence-level autocompletion (i.e., generating the whole translation as a sentence based on human input), but word-level autocompletion is under-explored so far. Second, almost no public benchmarks are available for the autocompletion task of CAT. This might be among the reasons why research progress in CAT is much slower compared to automatic MT. In this paper, we propose the task of general word-level autocompletion (GWLAN) from a real-world CAT scenario, and construct the first public benchmark to facilitate research in this topic. In addition, we propose an effective method for GWLAN and compare it with several strong baselines. Experiments demonstrate that our proposed method can give significantly more accurate predictions than the baseline methods on our benchmark datasets.
Automatic machine translation is super efficient to produce translations yet their quality is not guaranteed. This technique report introduces TranSmart, a practical human-machine interactive translation system that is able to trade off translation q uality and efficiency. Compared to existing publicly available interactive translation systems, TranSmart supports three key features, word-level autocompletion, sentence-level autocompletion and translation memory. By word-level and sentence-level autocompletion, TranSmart allows users to interactively translate words in their own manners rather than the strict manner from left to right. In addition, TranSmart has the potential to avoid similar translation mistakes by using translated sentences in history as its memory. This report presents major functions of TranSmart, algorithms for achieving these functions, how to use the TranSmart APIs, and evaluation results of some key functions. TranSmart is publicly available at its homepage (https://transmart.qq.com).
Many efforts have been devoted to extracting constituency trees from pre-trained language models, often proceeding in two stages: feature definition and parsing. However, this kind of methods may suffer from the branching bias issue, which will infla te the performances on languages with the same branch it biases to. In this work, we propose quantitatively measuring the branching bias by comparing the performance gap on a language and its reversed language, which is agnostic to both language models and extracting methods. Furthermore, we analyze the impacts of three factors on the branching bias, namely parsing algorithms, feature definitions, and language models. Experiments show that several existing works exhibit branching biases, and some implementations of these three factors can introduce the branching bias.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا