في الترجمة الآلية، يعد إعداد Corpus أحد المهام الحاسمة، خاصة لأزواج منخفضة.في بلدان متعددة اللغات مثل الهند، تلعب الترجمة الآلية دورا حيويا في التواصل بين الأشخاص الذين لديهم خلفيات لغوية مختلفة.تتوفر أنظمة الترجمة الآلية المتوفرة عبر الإنترنت من قبل Google و Microsoft والتي تشمل لغات مختلفة تفتقر إلى الدعم لغلق Khasi، والتي يمكن اعتبارها LonResource.نظرة عامة على هذه الورقة تطوير ENKHCCORP1.0، وهي كوربوس للإنجليزية - Khasi Pair، ونفذت أنظمة أساسية للترجمة الإنجليزي Englishtokhasi و Khasitoenglish بناء على نهج ترجمة الآلات العصبية.
In machine translation, corpus preparation is one of the crucial tasks, particularly for lowresource pairs. In multilingual countries like India, machine translation plays a vital role in communication among people with various linguistic backgrounds. There are available online automatic translation systems by Google and Microsoft which include various languages which lack support for the Khasi language, which can hence be considered lowresource. This paper overviews the development of EnKhCorp1.0, a corpus for English--Khasi pair, and implemented baseline systems for EnglishtoKhasi and KhasitoEnglish translation based on the neural machine translation approach.
References used
https://aclanthology.org/
Unsupervised Machine Translation (MT) model, which has the ability to perform MT without parallel sentences using comparable corpora, is becoming a promising approach for developing MT in low-resource languages. However, majority of the studies in un
This is a research proposal for doctoral research into sarcasm detection, and the real-time compilation of an English language corpus of sarcastic utterances. It details the previous research into similar topics, the potential research directions and the research aims.
This work introduces Itihasa, a large-scale translation dataset containing 93,000 pairs of Sanskrit shlokas and their English translations. The shlokas are extracted from two Indian epics viz., The Ramayana and The Mahabharata. We first describe the
This paper describes the construction of a new large-scale English-Japanese Simultaneous Interpretation (SI) corpus and presents the results of its analysis. A portion of the corpus contains SI data from three interpreters with different amounts of e
Automatic Text Summarization (ATS) is the task of generating concise and fluent summaries from one or more documents. In this paper, we present IceSum, the first Icelandic corpus annotated with human-generated summaries. IceSum consists of 1,000 onli