New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Learning to Recognize Dialect Features

تعلم التعرف على ميزات اللهجة

319 1 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

dialect features dialect feature detection ميزات اللهجة لهجة كشف ميزة اللهجة صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Building NLP systems that serve everyone requires accounting for dialect differences. But dialects are not monolithic entities: rather, distinctions between and within dialects are captured by the presence, absence, and frequency of dozens of dialect features in speech and text, such as the deletion of the copula in He ∅ running''. In this paper, we introduce the task of dialect feature detection, and present two multitask learning approaches, both based on pretrained transformers. For most dialects, large-scale annotated corpora for these features are unavailable, making it difficult to train recognizers. We train our models on a small number of minimal pairs, building on how linguists typically define dialect features. Evaluation on a test set of 22 dialect features of Indian English demonstrates that these models learn to recognize many features with high accuracy, and that a few minimal pairs can be as effective for training as thousands of labeled examples. We also demonstrate the downstream applicability of dialect feature detection both as a measure of dialect density and as a dialect classifier.

References used

https://aclanthology.org/

rate research

Country-level Arabic Dialect Identification using RNNs with and without Linguistic Features

537 - Association for Computation Linguistics 2021 مقالة

This work investigates the value of augmenting recurrent neural networks with feature engineering for the Second Nuanced Arabic Dialect Identification (NADI) Subtask 1.2: Country-level DA identification. We compare the performance of a simple word-le vel LSTM using pretrained embeddings with one enhanced using feature embeddings for engineered linguistic features. Our results show that the addition of explicit features to the LSTM is detrimental to performance. We attribute this performance loss to the bivalency of some linguistic items in some text, ubiquity of topics, and participant mobility.

منطقيا عربي country-level arabic dialect لهجة عربية على مستوى البلد صناعة حمض الفوسفور

Complex words identification using word-level features for SemEval-2020 Task 1

293 - Association for Computation Linguistics 2021 مقالة

This article describes a system to predict the complexity of words for the Lexical Complexity Prediction (LCP) shared task hosted at SemEval 2021 (Task 1) with a new annotated English dataset with a Likert scale. Located in the Lexical Semantics trac k, the task consisted of predicting the complexity value of the words in context. A machine learning approach was carried out based on the frequency of the words and several characteristics added at word level. Over these features, a supervised random forest regression algorithm was trained. Several runs were performed with different values to observe the performance of the algorithm. For the evaluation, our best results reported a M.A.E score of 0.07347, M.S.E. of 0.00938, and R.M.S.E. of 0.096871. Our experiments showed that, with a greater number of characteristics, the precision of the classification increases.

complex words identification identification using word-level الكلمات المعقدة الهوية تحديد الهوية باستخدام مستوى Word صناعة حمض الفوسفور

Learning Paralinguistic Features from Audiobooks through Style Voice Conversion

282 - Association for Computation Linguistics 2021 مقالة

Paralinguistics, the non-lexical components of speech, play a crucial role in human-human interaction. Models designed to recognize paralinguistic information, particularly speech emotion and style, are difficult to train because of the limited label ed datasets available. In this work, we present a new framework that enables a neural network to learn to extract paralinguistic attributes from speech using data that are not annotated for emotion. We assess the utility of the learned embeddings on the downstream tasks of emotion recognition and speaking style detection, demonstrating significant improvements over surface acoustic features as well as over embeddings extracted from other unsupervised approaches. Our work enables future systems to leverage the learned embedding extractor as a separate component capable of highlighting the paralinguistic components of speech.

style voice conversion voice conversion style voice نمط تحويل الصوت تحويل الصوت صوت نمط صناعة حمض الفوسفور المزيد..

Cross-Lingual Leveled Reading Based on Language-Invariant Features

209 - Association for Computation Linguistics 2021 مقالة

Leveled reading (LR) aims to automatically classify texts by the cognitive levels of readers, which is fundamental in providing appropriate reading materials regarding different reading capabilities. However, most state-of-the-art LR methods rely on the availability of copious annotated resources, which prevents their adaptation to low-resource languages like Chinese. In our work, to tackle LR in Chinese, we explore how different language transfer methods perform on English-Chinese LR. Specifically, we focus on adversarial training and cross-lingual pre-training method to transfer the LR knowledge learned from annotated data in the resource-rich English language to Chinese. For evaluation, we first introduce the age-based standard to align datasets with different leveling standards. Then we conduct experiments in both zero-shot and few-shot settings. Comparing these two methods, quantitative and qualitative evaluations show that the cross-lingual pre-training method effectively captures the language-invariant features between English and Chinese. We conduct analysis to propose further improvement in cross-lingual LR.

leveled reading based reading based leveled reading قراءة مقرها تعادل القراءة صناعة حمض الفوسفور

Can images help recognize entities? A study of the role of images for Multimodal NER

325 - Association for Computation Linguistics 2021 مقالة

Multimodal named entity recognition (MNER) requires to bridge the gap between language understanding and visual context. While many multimodal neural techniques have been proposed to incorporate images into the MNER task, the model's ability to lever age multimodal interactions remains poorly understood. In this work, we conduct in-depth analyses of existing multimodal fusion techniques from different perspectives and describe the scenarios where adding information from the image does not always boost performance. We also study the use of captions as a way to enrich the context for MNER. Experiments on three datasets from popular social platforms expose the bottleneck of existing multimodal models and the situations where using captions is beneficial.

recognize entities multimodal multimodal ner التعرف على الكيانات multimodal. multimodal ner. صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Learning to Recognize Dialect Features

تعلم التعرف على ميزات اللهجة

Ask ChatGPT about the research

Read More

suggested questions