Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Data-driven Identification of Idioms in Song Lyrics

تحديد البيانات التعبيرية في كلمات الأغاني

324 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

data-driven identification identification of idioms تحديد الهوية التي يحركها البيانات تحديد التعابير صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The automatic recognition of idioms poses a challenging problem for NLP applications. Whereas native speakers can intuitively handle multiword expressions whose compositional meanings are hard to trace back to individual word semantics, there is still ample scope for improvement regarding computational approaches. We assume that idiomatic constructions can be characterized by gradual intensities of semantic non-compositionality, formal fixedness, and unusual usage context, and introduce a number of measures for these characteristics, comprising count-based and predictive collocation measures together with measures of context (un)similarity. We evaluate our approach on a manually labelled gold standard, derived from a corpus of German pop lyrics. To this end, we apply a Random Forest classifier to analyze the individual contribution of features for automatically detecting idioms, and study the trade-off between recall and precision. Finally, we evaluate the classifier on an independent dataset of idioms extracted from a list of Wikipedia idioms, achieving state-of-the art accuracy.

References used

https://aclanthology.org/

rate research

Multi-Emotion Classification for Song Lyrics

557 - Association for Computation Linguistics 2021 مقالة

Song lyrics convey a multitude of emotions to the listener and powerfully portray the emotional state of the writer or singer. This paper examines a variety of modeling approaches to the multi-emotion classification problem for songs. We introduce th e Edmonds Dance dataset, a novel emotion-annotated lyrics dataset from the reader's perspective, and annotate the dataset of Mihalcea and Strapparava (2012) at the song level. We find that models trained on relatively small song datasets achieve marginally better performance than BERT (Devlin et al., 2018) fine-tuned on large social media or dialog datasets.

multi-emotion classification song lyrics song lyrics convey التصنيف متعدد العاطفة كلمات الاغنية كلمات الأغاني تنقل صناعة حمض الفوسفور المزيد..

Seed Words Based Data Selection for Language Model Adaptation

501 - Association for Computation Linguistics 2021 مقالة

We address the problem of language model customization in applications where the ASR component needs to manage domain-specific terminology; although current state-of-the-art speech recognition technology provides excellent results for generic domains , the adaptation to specialized dictionaries or glossaries is still an open issue. In this work we present an approach for automatically selecting sentences, from a text corpus, that match, both semantically and morphologically, a glossary of terms (words or composite words) furnished by the user. The final goal is to rapidly adapt the language model of an hybrid ASR system with a limited amount of in-domain text data in order to successfully cope with the linguistic domain at hand; the vocabulary of the baseline model is expanded and tailored, reducing the resulting OOV rate. Data selection strategies based on shallow morphological seeds and semantic similarity via word2vec are introduced and discussed; the experimental setting consists in a simultaneous interpreting scenario, where ASRs in three languages are designed to recognize the domainspecific terms (i.e. dentistry). Results using different metrics (OOV rate, WER, precision and recall) show the effectiveness of the proposed techniques.

language model adaptation language model customization تكيف نموذج اللغة تخصيص نموذج اللغة صناعة حمض الفوسفور

An Overview of Fairness in Data -- Illuminating the Bias in Data Pipeline

329 - Association for Computation Linguistics 2021 مقالة

Data in general encodes human biases by default; being aware of this is a good start, and the research around how to handle it is ongoing. The term bias' is extensively used in various contexts in NLP systems. In our research the focus is specific to biases such as gender, racism, religion, demographic and other intersectional views on biases that prevail in text processing systems responsible for systematically discriminating specific population, which is not ethical in NLP. These biases exacerbate the lack of equality, diversity and inclusion of specific population while utilizing the NLP applications. The tools and technology at the intermediate level utilize biased data, and transfer or amplify this bias to the downstream applications. However, it is not enough to be colourblind, gender-neutral alone when designing a unbiased technology -- instead, we should take a conscious effort by designing a unified framework to measure and benchmark the bias. In this paper, we recommend six measures and one augment measure based on the observations of the bias in data, annotations, text representations and debiasing techniques.

overview of fairness data pipeline illuminating the bias نظرة عامة على النزاهة خط أنابيب البيانات إيلاء التحيز صناعة حمض الفوسفور المزيد..

Supervised Identification of Participant Slots in Contracts

529 - Association for Computation Linguistics 2021 مقالة

This paper presents a technique for the identification of participant slots in English language contracts. Taking inspiration from unsupervised slot extraction techniques, the system presented here uses a supervised approach to identify terms used to refer to a genre-specific slot in novel contracts. We evaluate the system in multiple feature configurations to demonstrate that the best performing system in both genres of contracts omits the exact mention form from consideration---even though such mention forms are often the name of the slot under consideration---and is instead based solely on the dependency label and parent; in other words, a more reliable quantification of a party's role in a contract is found in what they do rather than what they are named.

identification of participant participant slots english language contracts تحديد المشارك فتحات المشارك عقود اللغة الإنجليزية صناعة حمض الفوسفور المزيد..

De-identification of Privacy-related Entities in Job Postings

305 - Association for Computation Linguistics 2021 مقالة

De-identification is the task of detecting privacy-related entities in text, such as person names, emails and contact data. It has been well-studied within the medical domain. The need for de-identification technology is increasing, as privacy-preser ving data handling is in high demand in many domains. In this paper, we focus on job postings. We present JobStack, a new corpus for de-identification of personal data in job vacancies on Stackoverflow. We introduce baselines, comparing Long-Short Term Memory (LSTM) and Transformer models. To improve these baselines, we experiment with BERT representations, and distantly related auxiliary data via multi-task learning. Our results show that auxiliary data helps to improve de-identification performance. While BERT representations improve performance, surprisingly vanilla'' BERT turned out to be more effective than BERT trained on Stackoverflow-related data.

privacy-related entities detecting privacy-related entities job postings الكيانات المتعلقة بالخصوصية الكشف عن الكيانات المتعلقة بالخصوصية وظائف شاغرة صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Data-driven Identification of Idioms in Song Lyrics

تحديد البيانات التعبيرية في كلمات الأغاني

Ask ChatGPT about the research

Read More

suggested questions