Do you want to publish a course? Click here

Error-Sensitive Evaluation for Ordinal Target Variables

تقييم حساس للخطأ للمتغيرات المستهدفة الترتيبية

344   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

Product reviews and satisfaction surveys seek customer feedback in the form of ranked scales. In these settings, widely used evaluation metrics including F1 and accuracy ignore the rank in the responses (e.g., very likely' is closer to likely' than not at all'). In this paper, we hypothesize that the order of class values is important for evaluating classifiers on ordinal target variables and should not be disregarded. To test this hypothesis, we compared Multi-class Classification (MC) and Ordinal Regression (OR) by applying OR and MC to benchmark tasks involving ordinal target variables using the same underlying model architecture. Experimental results show that while MC outperformed OR for some datasets in accuracy and F1, OR is significantly better than MC for minimizing the error between prediction and target for all benchmarks, as revealed by error-sensitive metrics, e.g. mean-squared error (MSE) and Spearman correlation. Our findings motivate the need to establish consistent, error-sensitive metrics for evaluating benchmarks with ordinal target variables, and we hope that it stimulates interest in exploring alternative losses for ordinal problems.



References used
https://aclanthology.org/
rate research

Read More

Non-autoregressive Transformer is a promising text generation model. However, current non-autoregressive models still fall behind their autoregressive counterparts in translation quality. We attribute this accuracy gap to the lack of dependency model ing among decoder inputs. In this paper, we propose CNAT, which learns implicitly categorical codes as latent variables into the non-autoregressive decoding. The interaction among these categorical codes remedies the missing dependencies and improves the model capacity. Experiment results show that our model achieves comparable or better performance in machine translation tasks than several strong baselines.
The modern era is witnessing a tangible development in all fields of science . As a result of this development , there is a growing need for statistical methods to solve the problems facing workers in these fields.
The stance detection task aims at detecting the stance of a tweet or a text for a target. These targets can be named entities or free-form sentences (claims). Though the task involves reasoning of the tweet with respect to a target, we find that it i s possible to achieve high accuracy on several publicly available Twitter stance detection datasets without looking at the target sentence. Specifically, a simple tweet classification model achieved human-level performance on the WT--WT dataset and more than two-third accuracy on various other datasets. We investigate the existence of biases in such datasets to find the potential spurious correlations of sentiment-stance relations and lexical choice associated with the stance category. Furthermore, we propose a new large dataset free of such biases and demonstrate its aptness on the existing stance detection systems. Our empirical findings show much scope for research on the stance detection task and proposes several considerations for creating future stance detection datasets.
The research focuses on the demands on the study of the total investment evolution, agricultural, investment and knowledge of the nature of the changes taking place during the period (2000-2011), and evaluate the performance of the economy and its ability to attract investments from the lack of it, in addition to the analysis of the factors affecting the total investment, agriculture and investment in Syria. Using descriptive and analytical approach, and quantitative analysis of the record, and it was the most important findings: that the annual growth of the net balance of payments is negative rate and the rate of 18.35-%, which will result in the deterioration of its value from year to year, as demonstrated by the total flexibility to function total investment transactions that increase the value of both exports College (x1), and foreign reserves (X3), and the deficit in the state budget (X8) 1% can lead both to increase the total investment by 3.5%, in total flexibility transactions to function agricultural investment showed that the increase in the value of each of the total exports GNP (X2), and foreign reserves (X3), the net balance of payments (X6) 1% can lead both to increase the total investment by 22.3%, and is the former variables of the most important determinants-oriented investment product and agriculture in Syria. Accordingly it requires the Syrian economy in order to increase its investments create the economic climate, and political investment and economic development.
This paper discusses a classification-based approach to machine translation evaluation, as opposed to a common regression-based approach in the WMT Metrics task. Recent machine translation usually works well but sometimes makes critical errors due to just a few wrong word choices. Our classification-based approach focuses on such errors using several error type labels, for practical machine translation evaluation in an age of neural machine translation. We made additional annotations on the WMT 2015-2017 Metrics datasets with fluency and adequacy labels to distinguish different types of translation errors from syntactic and semantic viewpoints. We present our human evaluation criteria for the corpus development and automatic evaluation experiments using the corpus. The human evaluation corpus will be publicly available upon publication.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا