New community

Subscribe to the gold package and get unlimited access to Shamra Academy

ESTIME: Estimation of Summary-to-Text Inconsistency by Mismatched Embeddings

ESTIME: تقدير عدم تناسق الملخص إلى النص عن طريق المغايات المدمجة

224 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We propose a new reference-free summary quality evaluation measure, with emphasis on the faithfulness. The measure is based on finding and counting all probable potential inconsistencies of the summary with respect to the source document. The proposed ESTIME, Estimator of Summary-to-Text Inconsistency by Mismatched Embeddings, correlates with expert scores in summary-level SummEval dataset stronger than other common evaluation measures not only in Consistency but also in Fluency. We also introduce a method of generating subtle factual errors in human summaries. We show that ESTIME is more sensitive to subtle errors than other common evaluation measures.

References used

https://aclanthology.org/

rate research

Data-to-text Generation by Splicing Together Nearest Neighbors

367 - Association for Computation Linguistics 2021 مقالة

We propose to tackle data-to-text generation tasks by directly splicing together retrieved segments of text from neighbor'' source-target pairs. Unlike recent work that conditions on retrieved neighbors but generates text token-by-token, left-to-righ t, we learn a policy that directly manipulates segments of neighbor text, by inserting or replacing them in partially constructed generations. Standard techniques for training such a policy require an oracle derivation for each generation, and we prove that finding the shortest such derivation can be reduced to parsing under a particular weighted context-free grammar. We find that policies learned in this way perform on par with strong baselines in terms of automatic and human evaluation, but allow for more interpretable and controllable generation.

splicing together nearest nearest neighbors nearest الربط معا الأقرب أقرب الجيران أقرب صناعة حمض الفوسفور المزيد..

Towards the Application of Calibrated Transformers to the Unsupervised Estimation of Question Difficulty from Text

264 - Association for Computation Linguistics 2021 مقالة

Being able to accurately perform Question Difficulty Estimation (QDE) can improve the accuracy of students' assessment and better their learning experience. Traditional approaches to QDE are either subjective or introduce a long delay before new ques tions can be used to assess students. Thus, recent work proposed machine learning-based approaches to overcome these limitations. They use questions of known difficulty to train models capable of inferring the difficulty of questions from their text. Once trained, they can be used to perform QDE of newly created questions. Existing approaches employ supervised models which are domain-dependent and require a large dataset of questions of known difficulty for training. Therefore, they cannot be used if such a dataset is not available ( for new courses on an e-learning platform). In this work, we experiment with the possibility of performing QDE from text in an unsupervised manner. Specifically, we use the uncertainty of calibrated question answering models as a proxy of human-perceived difficulty. Our experiments show promising results, suggesting that model uncertainty could be successfully leveraged to perform QDE from text, reducing both costs and elapsed time.

question difficulty estimation application of calibrated calibrated transformers تقييم صعوبة تقدير تطبيق معايرة محولات معايرة صناعة حمض الفوسفور المزيد..

BTS: Back TranScription for Speech-to-Text Post-Processor using Text-to-Speech-to-Text

349 - Association for Computation Linguistics 2021 مقالة

With the growing popularity of smart speakers, such as Amazon Alexa, speech is becoming one of the most important modes of human-computer interaction. Automatic speech recognition (ASR) is arguably the most critical component of such systems, as erro rs in speech recognition propagate to the downstream components and drastically degrade the user experience. A simple and effective way to improve the speech recognition accuracy is to apply automatic post-processor to the recognition result. However, training a post-processor requires parallel corpora created by human annotators, which are expensive and not scalable. To alleviate this problem, we propose Back TranScription (BTS), a denoising-based method that can create such corpora without human labor. Using a raw corpus, BTS corrupts the text using Text-to-Speech (TTS) and Speech-to-Text (STT) systems. Then, a post-processing model can be trained to reconstruct the original text given the corrupted input. Quantitative and qualitative evaluations show that a post-processor trained using our approach is highly effective in fixing non-trivial speech recognition errors such as mishandling foreign words. We present the generated parallel corpus and post-processing platform to make our results publicly available.

النماذج المدربة مسبقا amazon alexa back transcription الأمازون اليكسا النسخ الخلفي صناعة حمض الفوسفور

Interpreting Text Classifiers by Learning Context-sensitive Influence of Words

165 - Association for Computation Linguistics 2021 مقالة

Many existing approaches for interpreting text classification models focus on providing importance scores for parts of the input text, such as words, but without a way to test or improve the interpretation method itself. This has the effect of compou nding the problem of understanding or building trust in the model, with the interpretation method itself adding to the opacity of the model. Further, importance scores on individual examples are usually not enough to provide a sufficient picture of model behavior. To address these concerns, we propose MOXIE (MOdeling conteXt-sensitive InfluencE of words) with an aim to enable a richer interface for a user to interact with the model being interpreted and to produce testable predictions. In particular, we aim to make predictions for importance scores, counterfactuals and learned biases with MOXIE. In addition, with a global learning objective, MOXIE provides a clear path for testing and improving itself. We evaluate the reliability and efficiency of MOXIE on the task of sentiment analysis.

interpreting text classifiers text classifiers interpreting text تفسير نصوص النص نصوص النص تفسير النص صناعة حمض الفوسفور المزيد..

GeSERA: General-domain Summary Evaluation by Relevance Analysis

464 - Association for Computation Linguistics 2021 مقالة

We present GeSERA, an open-source improved version of SERA for evaluating automatic extractive and abstractive summaries from the general domain. SERA is based on a search engine that compares candidate and reference summaries (called queries) agains t an information retrieval document base (called index). SERA was originally designed for the biomedical domain only, where it showed a better correlation with manual methods than the widely used lexical-based ROUGE method. In this paper, we take out SERA from the biomedical domain to the general one by adapting its content-based method to successfully evaluate summaries from the general domain. First, we improve the query reformulation strategy with POS Tags analysis of general-domain corpora. Second, we replace the biomedical index used in SERA with two article collections from AQUAINT-2 and Wikipedia. We conduct experiments with TAC2008, TAC2009, and CNNDM datasets. Results show that, in most cases, GeSERA achieves higher correlations with manual evaluation methods than SERA, while it reduces its gap with ROUGE for general-domain summary evaluation. GeSERA even surpasses ROUGE in two cases of TAC2009. Finally, we conduct extensive experiments and provide a comprehensive study of the impact of human annotators and the index size on summary evaluation with SERA and GeSERA.

relevance analysis sera general-domain summary evaluation تحليل الصلة سيرا تقييم ملخص المجال العام صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

ESTIME: Estimation of Summary-to-Text Inconsistency by Mismatched Embeddings

ESTIME: تقدير عدم تناسق الملخص إلى النص عن طريق المغايات المدمجة

Ask ChatGPT about the research

Read More

suggested questions