Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Shared Task in Evaluating Accuracy: Leveraging Pre-Annotations in the Validation Process

مهمة مشتركة في تقييم الدقة: الاستفادة من التوضيحات السابقة في عملية التحقق من الصحة

737 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We hereby present our submission to the Shared Task in Evaluating Accuracy at the INLG 2021 Conference. Our evaluation protocol relies on three main components; rules and text classifiers that pre-annotate the dataset, a human annotator that validates the pre-annotations, and a web interface that facilitates this validation. Our submission consists in fact of two submissions; we first analyze solely the performance of the rules and classifiers (pre-annotations), and then the human evaluation aided by the former pre-annotations using the web interface (hybrid). The code for the web interface and the classifiers is publicly available.

References used

https://aclanthology.org/

rate research

Generation Challenges: Results of the Accuracy Evaluation Shared Task

946 - Association for Computation Linguistics 2021 مقالة

The Shared Task on Evaluating Accuracy focused on techniques (both manual and automatic) for evaluating the factual accuracy of texts produced by neural NLG systems, in a sports-reporting domain. Four teams submitted evaluation techniques for this ta sk, using very different approaches and techniques. The best-performing submissions did encouragingly well at this difficult task. However, all automatic submissions struggled to detect factual errors which are semantically or pragmatically complex (for example, based on incorrect computation or inference).

generation challenges accuracy evaluation shared evaluation shared task تحديات الجيل تقاسم تقييم الدقة تقييم المهمة المشتركة صناعة حمض الفوسفور المزيد..

Expected Validation Performance and Estimation of a Random Variable's Maximum

961 - Association for Computation Linguistics 2021 مقالة

Research in NLP is often supported by experimental results, and improved reporting of such results can lead to better understanding and more reproducible science. In this paper we analyze three statistical estimators for expected validation performan ce, a tool used for reporting performance (e.g., accuracy) as a function of computational budget (e.g., number of hyperparameter tuning experiments). Where previous work analyzing such estimators focused on the bias, we also examine the variance and mean squared error (MSE). In both synthetic and realistic scenarios, we evaluate three estimators and find the unbiased estimator has the highest variance, and the estimator with the smallest variance has the largest bias; the estimator with the smallest MSE strikes a balance between bias and variance, displaying a classic bias-variance tradeoff. We use expected validation performance to compare between different models, and analyze how frequently each estimator leads to drawing incorrect conclusions about which of two models performs best. We find that the two biased estimators lead to the fewest incorrect conclusions, which hints at the importance of minimizing variance and MSE.

random variable maximum variable maximum متغير عشوائي أقصى الحد الأقصى المتغير متغير عشوائي صناعة حمض الفوسفور

HW-TSC's Participation in the WMT 2021 Triangular MT Shared Task

755 - Association for Computation Linguistics 2021 مقالة

This paper presents the submission of Huawei Translation Service Center (HW-TSC) to WMT 2021 Triangular MT Shared Task. We participate in the Russian-to-Chinese task under the constrained condition. We use Transformer architecture and obtain the best performance via a variant with larger parameter sizes. We perform detailed data pre-processing and filtering on the provided large-scale bilingual data. Several strategies are used to train our models, such as Multilingual Translation, Back Translation, Forward Translation, Data Denoising, Average Checkpoint, Ensemble, Fine-tuning, etc. Our system obtains 32.5 BLEU on the dev set and 27.7 BLEU on the test set, the highest score among all submissions.

triangular mt shared translation service center مشتركة MT الثلاثي مركز خدمة الترجمة صناعة حمض الفوسفور

Bering Lab's Submissions on WAT 2021 Shared Task

817 - Association for Computation Linguistics 2021 مقالة

This paper presents the Bering Lab's submission to the shared tasks of the 8th Workshop on Asian Translation (WAT 2021) on JPC2 and NICT-SAP. We participated in all tasks on JPC2 and IT domain tasks on NICT-SAP. Our approach for all tasks mainly focu sed on building NMT systems in domain-specific corpora. We crawled patent document pairs for English-Japanese, Chinese-Japanese, and Korean-Japanese. After cleaning noisy data, we built parallel corpus by aligning those sentences with the sentence-level similarity scores. Also, for SAP test data, we collected the OPUS dataset including three IT domain corpora. We then trained transformer on the collected dataset. Our submission ranked 1st in eight out of fourteen tasks, achieving up to an improvement of 2.87 for JPC2 and 8.79 for NICT-SAP in BLEU score .

bering lab submissions bering lab lab submissions التقديمات معمل بيرينغ معمل بيرينغ التقديمات المختبرية صناعة حمض الفوسفور المزيد..

iCompass at Shared Task on Sarcasm and Sentiment Detection in Arabic

895 - Association for Computation Linguistics 2021 مقالة

We describe our submitted system to the 2021 Shared Task on Sarcasm and Sentiment Detection in Arabic (Abu Farha et al., 2021). We tackled both subtasks, namely Sarcasm Detection (Subtask 1) and Sentiment Analysis (Subtask 2). We used state-of-the-ar t pretrained contextualized text representation models and fine-tuned them according to the downstream task in hand. As a first approach, we used Google's multilingual BERT and then other Arabic variants: AraBERT, ARBERT and MARBERT. The results found show that MARBERT outperforms all of the previously mentioned models overall, either on Subtask 1 or Subtask 2.

تصنيف المعنويات sarcasm and sentiment السخرية والشعور صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Shared Task in Evaluating Accuracy: Leveraging Pre-Annotations in the Validation Process

مهمة مشتركة في تقييم الدقة: الاستفادة من التوضيحات السابقة في عملية التحقق من الصحة

Ask ChatGPT about the research

Read More

suggested questions