Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Large-Scale English-Japanese Simultaneous Interpretation Corpus: Construction and Analyses with Sentence-Aligned Data

كوربوس الترجمة الفورية الإنجليزية على نطاق واسع: البناء والتحليلات مع بيانات محاذاة الجملة

689 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

english-japanese simultaneous interpretation large-scale english-japanese simultaneous simultaneous interpretation corpus الترجمة الفورية الإنجليزية اليابانية في وقت واحد واسعة النطاق الإنجليزية اليابانية في وقت واحد تفسير في وقت واحد كوربوس صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper describes the construction of a new large-scale English-Japanese Simultaneous Interpretation (SI) corpus and presents the results of its analysis. A portion of the corpus contains SI data from three interpreters with different amounts of experience. Some of the SI data were manually aligned with the source speeches at the sentence level. Their latency, quality, and word order aspects were compared among the SI data themselves as well as against offline translations. The results showed that (1) interpreters with more experience controlled the latency and quality better, and (2) large latency hurt the SI quality.

References used

https://aclanthology.org/

rate research

Itihasa: A large-scale corpus for Sanskrit to English translation

698 - Association for Computation Linguistics 2021 مقالة

This work introduces Itihasa, a large-scale translation dataset containing 93,000 pairs of Sanskrit shlokas and their English translations. The shlokas are extracted from two Indian epics viz., The Ramayana and The Mahabharata. We first describe the motivation behind the curation of such a dataset and follow up with empirical analysis to bring out its nuances. We then benchmark the performance of standard translation models on this corpus and show that even state-of-the-art transformer architectures perform poorly, emphasizing the complexity of the dataset.

work introduces itihasa english translation large-scale translation dataset العمل يقدم Itihasa. الترجمة إلى الإنجليزية مجموعة بيانات الترجمة على نطاق واسع صناعة حمض الفوسفور المزيد..

NAIST English-to-Japanese Simultaneous Translation System for IWSLT 2021 Simultaneous Text-to-text Task

1012 - Association for Computation Linguistics 2021 مقالة

This paper describes NAIST's system for the English-to-Japanese Simultaneous Text-to-text Translation Task in IWSLT 2021 Evaluation Campaign. Our primary submission is based on wait-k neural machine translation with sequence-level knowledge distillation to encourage literal translation.

simultaneous translation system describes naist system naist system نظام الترجمة في وقت واحد يصف نظام NAIS نظام naist صناعة حمض الفوسفور المزيد..

A Large-Scale English Multi-Label Twitter Dataset for Cyberbullying and Online Abuse Detection

825 - Association for Computation Linguistics 2021 مقالة

In this paper, we introduce a new English Twitter-based dataset for cyberbullying detection and online abuse. Comprising 62,587 tweets, this dataset was sourced from Twitter using specific query terms designed to retrieve tweets with high probabiliti es of various forms of bullying and offensive content, including insult, trolling, profanity, sarcasm, threat, porn and exclusion. We recruited a pool of 17 annotators to perform fine-grained annotation on the dataset with each tweet annotated by three annotators. All our annotators are high school educated and frequent users of social media. Inter-rater agreement for the dataset as measured by Krippendorff's Alpha is 0.67. Analysis performed on the dataset confirmed common cyberbullying themes reported by other studies and revealed interesting relationships between the classes. The dataset was used to train a number of transformer-based deep learning models returning impressive results.

online abuse detection large-scale english multi-label english multi-label twitter اكتشاف إساءة الاستخدام عبر الإنترنت الترمية الإنجليزية متعددة الواسعة الإنجليزية متعددة التسمية تويتر صناعة حمض الفوسفور المزيد..

Back-translation for Large-Scale Multilingual Machine Translation

713 - Association for Computation Linguistics 2021 مقالة

This paper illustrates our approach to the shared task on large-scale multilingual machine translation in the sixth conference on machine translation (WMT-21). In this work, we aim to build a single multilingual translation system with a hypothesis t hat a universal cross-language representation leads to better multilingual translation performance. We extend the exploration of different back-translation methods from bilingual translation to multilingual translation. Better performance is obtained by the constrained sampling method, which is different from the finding of the bilingual translation. Besides, we also explore the effect of vocabularies and the amount of synthetic data. Surprisingly, the smaller size of vocabularies perform better, and the extensive monolingual English data offers a modest improvement. We submitted to both the small tasks and achieve the second place.

متعدد اللغات منخفضة الموارد صناعة حمض الفوسفور

TenTrans Large-Scale Multilingual Machine Translation System for WMT21

896 - Association for Computation Linguistics 2021 مقالة

This paper describes TenTrans large-scale multilingual machine translation system for WMT 2021. We participate in the Small Track 2 in five South East Asian languages, thirty directions: Javanese, Indonesian, Malay, Tagalog, Tamil, English. We mainly utilized forward/back-translation, in-domain data selection, knowledge distillation, and gradual fine-tuning from the pre-trained model FLORES-101. We find that forward/back-translation significantly improves the translation results, data selection and gradual fine-tuning are particularly effective during adapting domain, while knowledge distillation brings slight performance improvement. Also, model averaging is used to further improve the translation performance based on these systems. Our final system achieves an average BLEU score of 28.89 across thirty directions on the test set.

مقياس كبير متعدد اللغات tentrans large-scale multilingual tentrans على نطاق واسع متعدد اللغات صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Large-Scale English-Japanese Simultaneous Interpretation Corpus: Construction and Analyses with Sentence-Aligned Data

كوربوس الترجمة الفورية الإنجليزية على نطاق واسع: البناء والتحليلات مع بيانات محاذاة الجملة

Ask ChatGPT about the research

Read More

suggested questions