يسعى مركز الترجمة الظاهري الوطني (NVTC) إلى الحصول على أدوات تكنولوجيا اللغة البشرية (HLT) التي ستسهل مهمتها لتوفير ترجمات حرفية باللغة الإنجليزية لملفات الصوت والفيديو اللغوية.في المجال النصي، تستخدم NVTC ذاكرة الترجمة (TM) لبعض الوقت وقد أبلغت عن دمج الترجمة الآلية (MT) في سير العمل (Miller et al.، 2020).بينما لقد استكشفنا استخدام ترجمة الكلام (STT) وترجمة الكلام (stt) في الماضي (Tzoukermann و Miller، 2018)، فقد استثمرنا الآن في إنشاء كائن كبير من البشر من صنع الإنسان لتقييم بدائل بدقة.النتائج من تحليلنا لهذه الشقوق وأداء أدوات HLT تشير إلى الطريق إلى الأكثر واعدة للنشر في سير العمل لدينا.
The National Virtual Translation Center (NVTC) seeks to acquire human language technology (HLT) tools that will facilitate its mission to provide verbatim English translations of foreign language audio and video files. In the text domain, NVTC has been using translation memory (TM) for some time and has reported on the incorporation of machine translation (MT) into that workflow (Miller et al., 2020). While we have explored the use of speech-totext (STT) and speech translation (ST) in the past (Tzoukermann and Miller, 2018), we have now invested in the creation of a substantial human-made corpus to thoroughly evaluate alternatives. Results from our analysis of this corpus and the performance of HLT tools point the way to the most promising ones to deploy in our workflow.
References used
https://aclanthology.org/
Code-Mixing (CM) is a common phenomenon in multilingual societies. CM plays a significant role in technology and medical fields where terminologies in the native language are not available or known. Language Identification (LID) of the CM data will h
With the growing popularity of smart speakers, such as Amazon Alexa, speech is becoming one of the most important modes of human-computer interaction. Automatic speech recognition (ASR) is arguably the most critical component of such systems, as erro
Text simplification is a growing field with many potential useful applications. Training text simplification algorithms generally requires a lot of annotated data, however there are not many corpora suitable for this task. We propose a new unsupervis
A conventional approach to improving the performance of end-to-end speech translation (E2E-ST) models is to leverage the source transcription via pre-training and joint training with automatic speech recognition (ASR) and neural machine translation (
This paper describes the submission of the NiuTrans end-to-end speech translation system for the IWSLT 2021 offline task, which translates from the English audio to German text directly without intermediate transcription. We use the Transformer-based