No Arabic abstract
When humans judge the affective content of texts, they also implicitly assess the correctness of such judgment, that is, their confidence. We hypothesize that peoples (in)confidence that they performed well in an annotation task leads to (dis)agreements among each other. If this is true, confidence may serve as a diagnostic tool for systematic differences in annotations. To probe our assumption, we conduct a study on a subset of the Corpus of Contemporary American English, in which we ask raters to distinguish neutral sentences from emotion-bearing ones, while scoring the confidence of their answers. Confidence turns out to approximate inter-annotator disagreements. Further, we find that confidence is correlated to emotion intensity: perceiving stronger affect in text prompts annotators to more certain classification performances. This insight is relevant for modelling studies of intensity, as it opens the question wether automatic regressors or classifiers actually predict intensity, or rather humans self-perceived confidence.
Emotion intensity prediction determines the degree or intensity of an emotion that the author expresses in a text, extending previous categorical approaches to emotion detection. While most previous work on this topic has concentrated on English texts, other languages would also benefit from fine-grained emotion classification, preferably without having to recreate the amount of annotated data available in English in each new language. Consequently, we explore cross-lingual transfer approaches for fine-grained emotion detection in Spanish and Catalan tweets. To this end we annotate a test set of Spanish and Catalan tweets using Best-Worst scaling. We compare six cross-lingual approaches, e.g., machine translation and cross-lingual embeddings, which have varying requirements for parallel data -- from millions of parallel sentences to completely unsupervised. The results show that on this data, methods with low parallel-data requirements perform surprisingly better than methods that use more parallel data, which we explain through an in-depth error analysis. We make the dataset and the code available at url{https://github.com/jerbarnes/fine-grained_cross-lingual_emotion}
The number of user reviews of tourist attractions, restaurants, mobile apps, etc. is increasing for all languages; yet, research is lacking on how reviews in multiple languages should be aggregated and displayed. Speakers of different languages may have consistently different experiences, e.g., different information available in different languages at tourist attractions or different user experiences with software due to internationalization/localization choices. This paper assesses the similarity in the ratings given by speakers of different languages to London tourist attractions on TripAdvisor. The correlations between different languages are generally high, but some language pairs are more correlated than others. The results question the common practice of computing average ratings from reviews in many languages.
From a practical perspective it is advantageous to develop experimental methods that verify entanglement in quantum states with as few measurements as possible. In this paper we investigate the minimal number of measurements needed to detect bound entanglement in bipartite $(dtimes d)$-dimensional states, i.e. entangled states that are positive under partial transposition. In particular, we show that a class of entanglement witnesses composed of mutually unbiased bases (MUBs) can detect bound entanglement if the number of measurements is greater than $d/2+1$. This is a substantial improvement over other detection methods, requiring significantly fewer resources than either full quantum state tomography or measuring a complete set of $d+1$ MUBs. Our approach is based on a partial characterisation of the (non-)decomposability of entanglement witnesses. We show that non-decomposability is a universal property of MUBs, which holds regardless of the choice of complementary observables, and we find that both the number of measurements and the structure of the witness play an important role in the detection of bound entanglement.
While non-autoregressive (NAR) models are showing great promise for machine translation, their use is limited by their dependence on knowledge distillation from autoregressive models. To address this issue, we seek to understand why distillation is so effective. Prior work suggests that distilled training data is less complex than manual translations. Based on experiments with the Levenshtein Transformer and the Mask-Predict NAR models on the WMT14 German-English task, this paper shows that different types of complexity have different impacts: while reducing lexical diversity and decreasing reordering complexity both help NAR learn better alignment between source and target, and thus improve translation quality, lexical diversity is the main reason why distillation increases model confidence, which affects the calibration of different NAR models differently.
Most current NLP systems have little knowledge about quantitative attributes of objects and events. We propose an unsupervised method for collecting quantitative information from large amounts of web data, and use it to create a new, very large resource consisting of distributions over physical quantities associated with objects, adjectives, and verbs which we call Distributions over Quantitative (DoQ). This contrasts with recent work in this area which has focused on making only relative comparisons such as Is a lion bigger than a wolf?. Our evaluation shows that DoQ compares favorably with state of the art results on existing datasets for relative comparisons of nouns and adjectives, and on a new dataset we introduce.