New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Building a Corpus for Corporate Websites Machine Translation Evaluation. A Step by Step Methodological Approach

بناء كوربوس لمواقع الويب التقييم لتقييم الترجمة.خطوة بخطوة منهجية منهجية

397 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

step methodological approach corporate websites machine machine translation evaluation خطوة النهج المنهجي مواقع الويب الخاصة بالشركات صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The aim of this paper is to describe the process carried out to develop a paral-lel corpus comprised of texts extracted from the corporate websites of south-ern Spanish SMEs from the sanitary sector which will serve as the basis for MT quality assessment. The stages for compiling the parallel corpora were: (i) selection of websites with content translated in English and Spanish, (ii) downloading of the HTML files of the selected websites, (iii) files filtering and pairing of English files with their Spanish equivalents, (iv) compilation of individual corpora (EN and ES) for each of the selected websites, (v) merging of the individual corpora into a two general corpus one in English and the other in Spanish, (vi) selection a representative sample of segments to be used as original (ES) and reference translations (EN), (vii) building of the parallel corpus intended for MT evaluation. The parallel corpus generated will serve to future Machine Translation quality assessment. In addition, the monolingual corpora generated during the process could as a base to carry out research focused on linguistic -- bilingual or monolingual − analysis.

References used

https://aclanthology.org/

rate research

Building A Corporate Corpus For Threads Constitution

290 - Association for Computation Linguistics 2021 مقالة

In this paper we describe the process of build-ing a corporate corpus that will be used as a ref-erence for modelling and computing threadsfrom conversations generated using commu-nication and collaboration tools. The overallgoal of the reconstructio n of threads is to beable to provide value to the collorator in var-ious use cases, such as higlighting the impor-tant parts of a running discussion, reviewingthe upcoming commitments or deadlines, etc. Since, to our knowledge, there is no avail-able corporate corpus for the French languagewhich could allow us to address this prob-lem of thread constitution, we present here amethod for building such corpora includingdifferent aspects and steps which allowed thecreation of a pipeline to pseudo-anonymisedata. Such a pipeline is a response to theconstraints induced by the General Data Pro-tection Regulation GDPR in Europe and thecompliance to the secrecy of correspondence.

corporate corpus avail-able corporate corpus threads constitution كوربوس الشركات الاستفادة من Corpus Corpus مواضيع الدستور صناعة حمض الفوسفور المزيد..

A Step towards Building Environmental Protection Strategy: Game Theory Framework for Al-Abrash River Basin Management

1731 - International Journal of Environmental Science and Development 2019 ورقة بحثية

The current random behavior of stakeholders within the Al-Abrash river basin in Syrian coastal region, the lake and the river, threatens more than ever to pollute the whole basin. The goal of this paper is to address the state of shared management of water resources among local players through game theory application based on two self-interest strategies for each player to reach a balance point taking into consideration the government intervention as the organizer of the game. Therefore, non-cooperative game theory NCGT adopted as an analytical approach for modeling planning assets conflicts. ArcGIS software adopted to define different areas according to its risk/land-use types. The result shows that the equilibrium point "non-cooperate-non-cooperate" strategy between the players could lean towards "cooperative-cooperative" strategy in the light of the provincial government effect, adopting innovating competitive planning policies. That will lead to an interactive economical-environmental balance in the river basin and helps to reach rational decisions. Therefore, this paper could be classified as one of the studies seeking to apply the participatory planning approach toward sustainable development. Index Terms-Al-Abrash river basin, environmental protection strategy, game theory, participatory approach.

Game Theory environmental protection strategy participatory approach Al-Abrash river basin

Discussion Structure Prediction Based on a Two-step Method

290 - Association for Computation Linguistics 2021 مقالة

Conversations are often held in laboratories and companies. A summary is vital to grasp the content of a discussion for people who did not attend the discussion. If the summary is illustrated as an argument structure, it is helpful to grasp the discu ssion's essentials immediately. Our purpose in this paper is to predict a link structure between nodes that consist of utterances in a conversation: classification of each node pair into linked'' or not-linked.'' One approach to predict the structure is to utilize machine learning models. However, the result tends to over-generate links of nodes. To solve this problem, we introduce a two-step method to the structure prediction task. We utilize a machine learning-based approach as the first step: a link prediction task. Then, we apply a score-based approach as the second step: a link selection task. Our two-step methods dramatically improved the accuracy as compared with one-step methods based on SVM and BERT.

النطاق الطبي الطبيعي structure prediction discussion هيكل التنبؤ الهيكل مناقشة صناعة حمض الفوسفور المزيد..

Glossary functionality in commercial machine translation: does it help? A first step to identify best practices for a language service provider

357 - Association for Computation Linguistics 2021 مقالة

Recently, a number of commercial Machine Translation (MT) providers have started to offer glossary features allowing users to enforce terminology into the output of a generic model. However, to the best of our knowledge it is not clear how such featu res would impact terminology accuracy and the overall quality of the output. The present contribution aims at providing a first insight into the performance of the glossary-enhanced generic models offered by four providers. Our tests involve two different domains and language pairs, i.e. Sportswear En--Fr and Industrial Equipment De--En. The output of each generic model and of the glossaryenhanced one will be evaluated relying on Translation Error Rate (TER) to take into account the overall output quality and on accuracy to assess the compliance with the glossary. This is followed by a manual evaluation. The present contribution mainly focuses on understanding how these glossary features can be fruitfully exploited by language service providers (LSPs), especially in a scenario in which a customer glossary is already available and is added to the generic model as is.

commercial machine translation commercial machine ترجمة الآلة التجارية آلة تجارية صناعة حمض الفوسفور

Itihasa: A large-scale corpus for Sanskrit to English translation

329 - Association for Computation Linguistics 2021 مقالة

This work introduces Itihasa, a large-scale translation dataset containing 93,000 pairs of Sanskrit shlokas and their English translations. The shlokas are extracted from two Indian epics viz., The Ramayana and The Mahabharata. We first describe the motivation behind the curation of such a dataset and follow up with empirical analysis to bring out its nuances. We then benchmark the performance of standard translation models on this corpus and show that even state-of-the-art transformer architectures perform poorly, emphasizing the complexity of the dataset.

work introduces itihasa english translation large-scale translation dataset العمل يقدم Itihasa. الترجمة إلى الإنجليزية مجموعة بيانات الترجمة على نطاق واسع صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Building a Corpus for Corporate Websites Machine Translation Evaluation. A Step by Step Methodological Approach

بناء كوربوس لمواقع الويب التقييم لتقييم الترجمة.خطوة بخطوة منهجية منهجية

Ask ChatGPT about the research

Read More

suggested questions