Do you want to publish a course? Click here

A Comparison of the Word Similarity Measurement in English-Arabic Translation Memory Segment Retrieval Including an Inflectional Affix Intervention

مقارنة من كلمة قياس التشابه في استرجاع قطاع ذاكرة الترجمة الإنجليزية - العربية

405   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

The aim of this paper is to investigate the similarity measurement approach of translation memory (TM) in five representative computer-aided translation (CAT) tools when retrieving inflectional verb-variation sentences in Arabic to English translation. In English, inflectional affixes in verbs include suffixes only; unlike English, verbs in Arabic derive voice, mood, tense, number and person through various inflectional affixes e.g. pre or post a verb root. The research question focuses on establishing whether the TM similarity algorithm measures a combination of the inflectional affixes as a word or as a character intervention when retrieving a segment. If it is dealt with as a character intervention, are the types of intervention penalized equally or differently? This paper experimentally examines, through a black box testing methodology and a test suite instrument, the penalties that TM systems' current algorithms impose when input segments and retrieved TM sources are exactly the same, except for a difference in an inflectional affix. It would be expected that, if TM systems had some linguistic knowledge, the penalty would be very light, which would be useful to translators, since a high-scoring match would be presented near the top of the list of proposals. However, analysis of TM systems' output shows that inflectional affixes are penalized more heavily than expected, and in different ways. They may be treated as an intervention on the whole word, or as a single character change.



References used
https://aclanthology.org/
rate research

Read More

Translation Memory (TM) system, a major component of computer-assisted translation (CAT), is widely used to improve human translators' productivity by making effective use of previously translated resource. We propose a method to achieve high-speed r etrieval from a large translation memory by means of similarity evaluation based on vector model, and present the experimental result. Through our experiment using Lucene, an open source information retrieval search engine, we conclude that it is possible to achieve real-time retrieval speed of about tens of microseconds even for a large translation memory with 5 million segment pairs.
Translation memory systems (TMS) are the main component of computer-assisted translation (CAT) tools. They store translations allowing to save time by presenting translations on the database through matching of several types such as fuzzy matches, wh ich are calculated by algorithms like the edit distance. However, studies have demonstrated the linguistic deficiencies of these systems and the difficulties in data retrieval or obtaining a high percentage of matching, especially after the application of syntactic and semantic transformations as the active/passive voice change, change of word order, substitution by a synonym or a personal pronoun, for instance. This paper presents the results of a pilot study where we analyze the qualitative and quantitative data of questionnaires conducted with professional translators of Spanish, French and Arabic in order to improve the effectiveness of TMS and explore all possibilities to integrate further linguistic processing from ten transformation types. The results are encouraging, and they allowed us to find out about the translation process itself; from which we propose a pre-editing processing tool to improve the matching and retrieving processes.
Despite the enormous popularity of Translation Memory systems and the active research in the field, their language processing features still suffer from certain limitations. While many recent papers focus on semantic matching capabilities of TMs, thi s planned study will address how these tools perform when dealing with longer segments and whether this could be a cause of lower match scores. An experiment will be carried out on corpora from two different (repetitive) domains. Following the results, recommendations for future developments of new TMs will be made.
Document alignment techniques based on multilingual sentence representations have recently shown state of the art results. However, these techniques rely on unsupervised distance measurement techniques, which cannot be fined-tuned to the task at hand . In this paper, instead of these unsupervised distance measurement techniques, we employ Metric Learning to derive task-specific distance measurements. These measurements are supervised, meaning that the distance measurement metric is trained using a parallel dataset. Using a dataset belonging to English, Sinhala, and Tamil, which belong to three different language families, we show that these task-specific supervised distance learning metrics outperform their unsupervised counterparts, for document alignment.
This research studies the notion of non-equivalence in translation between English and Arabic. It displays the main issues translators face when translating, like cultural restraints and linguistic barriers. It also suggests a number of strategies that help in dealing with non-equivalence, including paraphrasing, omission, and cultural substitution.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا