Do you want to publish a course? Click here

Me, myself, and ire: Effects of automatic transcription quality on emotion, sarcasm, and personality detection

لي، أنا، وأنا IRE: آثار جودة النسخ التلقائي على العاطفة والسخرية والكشف عن الشخصية

274   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

In deployment, systems that use speech as input must make use of automated transcriptions. Yet, typically when these systems are evaluated, gold transcriptions are assumed. We explicitly examine the impact of transcription errors on the downstream performance of a multi-modal system on three related tasks from three datasets: emotion, sarcasm, and personality detection. We include three separate transcription tools and show that while all automated transcriptions propagate errors that substantially impact downstream performance, the open-source tools fair worse than the paid tool, though not always straightforwardly, and word error rates do not correlate well with downstream performance. We further find that the inclusion of audio features partially mitigates transcription errors, but that a naive usage of a multi-task setup does not.



References used
https://aclanthology.org/
rate research

Read More

We present three methods developed for the Shared Task on Sarcasm and Sentiment Detection in Arabic. We present a baseline that uses character n-gram features. We also propose two more sophisticated methods: a recurrent neural network with a word lev el representation and an ensemble classifier relying on word and character-level features. We chose to present results from an ensemble classifier but it was not very successful as compared to the best systems : 22th/37 on sarcasm detection and 15th/22 on sentiment detection. It finally appeared that our baseline could have been improved and beat those results.
We present a model to predict fine-grained emotions along the continuous dimensions of valence, arousal, and dominance (VAD) with a corpus with categorical emotion annotations. Our model is trained by minimizing the EMD (Earth Mover's Distance) loss between the predicted VAD score distribution and the categorical emotion distributions sorted along VAD, and it can simultaneously classify the emotion categories and predict the VAD scores for a given sentence. We use pre-trained RoBERTa-Large and fine-tune on three different corpora with categorical labels and evaluate on EmoBank corpus with VAD scores. We show that our approach reaches comparable performance to that of the state-of-the-art classifiers in categorical emotion classification and shows significant positive correlations with the ground truth VAD scores. Also, further training with supervision of VAD labels leads to improved performance especially when dataset is small. We also present examples of predictions of appropriate emotion words that are not part of the original annotations.
Within the last few years, the number of Arabic internet users and Arabic online content is in exponential growth. Dealing with Arabic datasets and the usage of non-explicit sentences to express an opinion are considered to be the major challenges in the field of natural language processing. Hence, sarcasm and sentiment analysis has gained a major interest from the research community, especially in this language. Automatic sarcasm detection and sentiment analysis can be applied using three approaches, namely supervised, unsupervised and hybrid approach. In this paper, a model based on a supervised machine learning algorithm called Support Vector Machine (SVM) has been used for this process. The proposed model has been evaluated using ArSarcasm-v2 dataset. The performance of the proposed model has been compared with other models submitted to sentiment analysis and sarcasm detection shared task.
Appraisal theories explain how the cognitive evaluation of an event leads to a particular emotion. In contrast to theories of basic emotions or affect (valence/arousal), this theory has not received a lot of attention in natural language processing. Yet, in psychology it has been proven powerful: Smith and Ellsworth (1985) showed that the appraisal dimensions attention, certainty, anticipated effort, pleasantness, responsibility/control and situational control discriminate between (at least) 15 emotion classes. We study different annotation strategies for these dimensions, based on the event-focused enISEAR corpus (Troiano et al., 2019). We analyze two manual annotation settings: (1) showing the text to annotate while masking the experienced emotion label; (2) revealing the emotion associated with the text. Setting 2 enables the annotators to develop a more realistic intuition of the described event, while Setting 1 is a more standard annotation procedure, purely relying on text. We evaluate these strategies in two ways: by measuring inter-annotator agreement and by fine- tuning RoBERTa to predict appraisal variables. Our results show that knowledge of the emotion increases annotators' reliability. Further, we evaluate a purely automatic rule-based labeling strategy (inferring appraisal from annotated emotion classes). Training on automatically assigned labels leads to a competitive performance of our classifier, even when tested on manual annotations. This is an indicator that it might be possible to automatically create appraisal corpora for every domain for which emotion corpora already exist.
The introduction of transformer-based language models has been a revolutionary step for natural language processing (NLP) research. These models, such as BERT, GPT and ELECTRA, led to state-of-the-art performance in many NLP tasks. Most of these mode ls were initially developed for English and other languages followed later. Recently, several Arabic-specific models started emerging. However, there are limited direct comparisons between these models. In this paper, we evaluate the performance of 24 of these models on Arabic sentiment and sarcasm detection. Our results show that the models achieving the best performance are those that are trained on only Arabic data, including dialectal Arabic, and use a larger number of parameters, such as the recently released MARBERT. However, we noticed that AraELECTRA is one of the top performing models while being much more efficient in its computational cost. Finally, the experiments on AraGPT2 variants showed low performance compared to BERT models, which indicates that it might not be suitable for classification tasks.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا