تصف هذه الورقة بناء كوربوس تفسير اللغة الإنجليزية واليابانية على نطاق واسع (SI) ويعرض نتائج تحليلها.يحتوي جزء من Corpus على بيانات SI من ثلاثة مترجمين مع كميات مختلفة من الخبرة.تم محاذاة بعض بيانات SI يدويا مع خطب المصدر على مستوى الجملة.تمت مقارنة جوانب الكمون والجودة ونظام ترتيب الكلمات بين بيانات SI نفسها وكذلك ضد الترجمات دون اتصال.أظهرت النتائج أن المترجمين الفوريين (1) مع المزيد من الخبرة التي تسيطر على الكمون والجودة بشكل أفضل، و (2) مزامور زمنية كبيرة تؤذي جودة SI.
This paper describes the construction of a new large-scale English-Japanese Simultaneous Interpretation (SI) corpus and presents the results of its analysis. A portion of the corpus contains SI data from three interpreters with different amounts of experience. Some of the SI data were manually aligned with the source speeches at the sentence level. Their latency, quality, and word order aspects were compared among the SI data themselves as well as against offline translations. The results showed that (1) interpreters with more experience controlled the latency and quality better, and (2) large latency hurt the SI quality.
References used
https://aclanthology.org/
This work introduces Itihasa, a large-scale translation dataset containing 93,000 pairs of Sanskrit shlokas and their English translations. The shlokas are extracted from two Indian epics viz., The Ramayana and The Mahabharata. We first describe the
This paper describes NAIST's system for the English-to-Japanese Simultaneous Text-to-text Translation Task in IWSLT 2021 Evaluation Campaign. Our primary submission is based on wait-k neural machine translation with sequence-level knowledge distillation to encourage literal translation.
In this paper, we introduce a new English Twitter-based dataset for cyberbullying detection and online abuse. Comprising 62,587 tweets, this dataset was sourced from Twitter using specific query terms designed to retrieve tweets with high probabiliti
This paper illustrates our approach to the shared task on large-scale multilingual machine translation in the sixth conference on machine translation (WMT-21). In this work, we aim to build a single multilingual translation system with a hypothesis t
This paper describes TenTrans large-scale multilingual machine translation system for WMT 2021. We participate in the Small Track 2 in five South East Asian languages, thirty directions: Javanese, Indonesian, Malay, Tagalog, Tamil, English. We mainly