New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers

ما وراء طرف جبل الجليد: تقييم تماسك نصوص النصوص

188 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

العظة والخطأ assessing coherence iceberg تقييم الاتساق جبل جليد صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

As large-scale, pre-trained language models achieve human-level and superhuman accuracy on existing language understanding tasks, statistical bias in benchmark data and probing studies have recently called into question their true capabilities. For a more informative evaluation than accuracy on text classification tasks can offer, we propose evaluating systems through a novel measure of prediction coherence. We apply our framework to two existing language understanding benchmarks with different properties to demonstrate its versatility. Our experimental results show that this evaluation framework, although simple in ideas and implementation, is a quick, effective, and versatile measure to provide insight into the coherence of machines' predictions.

References used

https://aclanthology.org/

rate research

Interpreting Text Classifiers by Learning Context-sensitive Influence of Words

166 - Association for Computation Linguistics 2021 مقالة

Many existing approaches for interpreting text classification models focus on providing importance scores for parts of the input text, such as words, but without a way to test or improve the interpretation method itself. This has the effect of compou nding the problem of understanding or building trust in the model, with the interpretation method itself adding to the opacity of the model. Further, importance scores on individual examples are usually not enough to provide a sufficient picture of model behavior. To address these concerns, we propose MOXIE (MOdeling conteXt-sensitive InfluencE of words) with an aim to enable a richer interface for a user to interact with the model being interpreted and to produce testable predictions. In particular, we aim to make predictions for importance scores, counterfactuals and learned biases with MOXIE. In addition, with a global learning objective, MOXIE provides a clear path for testing and improving itself. We evaluate the reliability and efficiency of MOXIE on the task of sentiment analysis.

interpreting text classifiers text classifiers interpreting text تفسير نصوص النص نصوص النص تفسير النص صناعة حمض الفوسفور المزيد..

The degree of knowledge and recruit teachers to strategies beyond the cognitive thinking in the education of excelling mentally students

1266 - Aِl-Baath University 2017 ورقة بحثية

The research aims to identify the degree of knowledge and recruit teachers for thinking strategies beyond the knowledge in the education of excelling students in Damascus, mentally, knowing the significance of differences in the degree of their kn owledge and their employment of these strategies depending on the variables (training courses, educational qualification). The sample of the research was (53) teacher, has been selected in the manner intended by the valiant minor for high achievers, as applied to them to identify knowledge and recruit teachers for thinking strategies beyond the cognitive, which are prepared by the researcher after verifying the validity and reliability.

إستراتيجيات التفكير ما وراء المعرفي مدرسي الطلبة المتفوقين strategies beyond the cognitive thinking teachers of outstanding students

Do We Know What We Don't Know? Studying Unanswerable Questions beyond SQuAD 2.0

239 - Association for Computation Linguistics 2021 مقالة

Understanding when a text snippet does not provide a sought after information is an essential part of natural language utnderstanding. Recent work (SQuAD 2.0; Rajpurkar et al., 2018) has attempted to make some progress in this direction by enriching the SQuAD dataset for the Extractive QA task with unanswerable questions. However, as we show, the performance of a top system trained on SQuAD 2.0 drops considerably in out-of-domain scenarios, limiting its use in practical situations. In order to study this we build an out-of-domain corpus, focusing on simple event-based questions and distinguish between two types of IDK questions: competitive questions, where the context includes an entity of the same type as the expected answer, and simpler, non-competitive questions where there is no entity of the same type in the context. We find that SQuAD 2.0-based models fail even in the case of the simpler questions. We then analyze the similarities and differences between the IDK phenomenon in Extractive QA and the Recognizing Textual Entailments task (RTE; Dagan et al., 2013) and investigate the extent to which the latter can be used to improve the performance.

studying unanswerable questions unanswerable questions دراسة أسئلة لا يمكن إجراؤها أسئلة لا يمكن إجراؤها صناعة حمض الفوسفور

Beyond Glass-Box Features: Uncertainty Quantification Enhanced Quality Estimation for Neural Machine Translation

300 - Association for Computation Linguistics 2021 مقالة

Quality Estimation (QE) plays an essential role in applications of Machine Translation (MT). Traditionally, a QE system accepts the original source text and translation from a black-box MT system as input. Recently, a few studies indicate that as a b y-product of translation, QE benefits from the model and training data's information of the MT system where the translations come from, and it is called the glass-box QE''. In this paper, we extend the definition of glass-box QE'' generally to uncertainty quantification with both black-box'' and glass-box'' approaches and design several features deduced from them to blaze a new trial in improving QE's performance. We propose a framework to fuse the feature engineering of uncertainty quantification into a pre-trained cross-lingual language model to predict the translation quality. Experiment results show that our method achieves state-of-the-art performances on the datasets of WMT 2020 QE shared task.

enhanced quality estimation quantification enhanced quality تقدير الجودة المحسن الجودة المعززة الجودة صناعة حمض الفوسفور

Improving Unsupervised Dialogue Topic Segmentation with Utterance-Pair Coherence Scoring

293 - Association for Computation Linguistics 2021 مقالة

Dialogue topic segmentation is critical in several dialogue modeling problems. However, popular unsupervised approaches only exploit surface features in assessing topical coherence among utterances. In this work, we address this limitation by leverag ing supervisory signals from the utterance-pair coherence scoring task. First, we present a simple yet effective strategy to generate a training corpus for utterance-pair coherence scoring. Then, we train a BERT-based neural utterance-pair coherence model with the obtained training corpus. Finally, such model is used to measure the topical relevance between utterances, acting as the basis of the segmentation inference. Experiments on three public datasets in English and Chinese demonstrate that our proposal outperforms the state-of-the-art baselines.

dialogue topic segmentation unsupervised dialogue topic improving unsupervised dialogue تجزئة موضوع الحوار موضوع الحوار غير المزعوم تحسين الحوار غير المنشور صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers

ما وراء طرف جبل الجليد: تقييم تماسك نصوص النصوص

Ask ChatGPT about the research

Read More

suggested questions