New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Investigating the Impact of Gender Representation in ASR Training Data: a Case Study on Librispeech

التحقيق في تأثير التمثيل الجنساني في بيانات تدريب ASR: دراسة حالة عن Libispeech

235 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

التحيز في ويكيبيديا asr training data gender representation بيانات التدريب العصر تمثيل الجنس صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper we question the impact of gender representation in training data on the performance of an end-to-end ASR system. We create an experiment based on the Librispeech corpus and build 3 different training corpora varying only the proportion of data produced by each gender category. We observe that if our system is overall robust to the gender balance or imbalance in training data, it is nonetheless dependant of the adequacy between the individuals present in the training and testing sets.

References used

https://aclanthology.org/

rate research

Investigating Post-pretraining Representation Alignment for Cross-Lingual Question Answering

302 - Association for Computation Linguistics 2021 مقالة

Human knowledge is collectively encoded in the roughly 6500 languages spoken around the world, but it is not distributed equally across languages. Hence, for information-seeking question answering (QA) systems to adequately serve speakers of all lang uages, they need to operate cross-lingually. In this work we investigate the capabilities of multilingually pretrained language models on cross-lingual QA. We find that explicitly aligning the representations across languages with a post-hoc finetuning step generally leads to improved performance. We additionally investigate the effect of data size as well as the language choice in this fine-tuning step, also releasing a dataset for evaluating cross-lingual QA systems.

post-pretraining representation alignment investigating post-pretraining representation investigating post-pretraining محاذاة التمثيل بعد الاحتجاج التحقيق في التمثيل بعد الاحتجاس التحقيق في ما بعد الاحتجاج صناعة حمض الفوسفور المزيد..

Investigating Annotator Bias in Abusive Language Datasets

311 - Association for Computation Linguistics 2021 مقالة

Nowadays, social media platforms use classification models to cope with hate speech and abusive language. The problem of these models is their vulnerability to bias. A prevalent form of bias in hate speech and abusive language datasets is annotator b ias caused by the annotator's subjective perception and the complexity of the annotation task. In our paper, we develop a set of methods to measure annotator bias in abusive language datasets and to identify different perspectives on abusive language. We apply these methods to four different abusive language datasets. Our proposed approach supports annotation processes of such datasets and future research addressing different perspectives on the perception of abusive language.

abusive language datasets language datasets مجموعات بيانات اللغة المسيئة مجموعات البيانات اللغة صناعة حمض الفوسفور

Hyperparameter Power Impact in Transformer Language Model Training

211 - Association for Computation Linguistics 2021 مقالة

Training large language models can consume a large amount of energy. We hypothesize that the language model's configuration impacts its energy consumption, and that there is room for power consumption optimisation in modern large language models. To investigate these claims, we introduce a power consumption factor to the objective function, and explore the range of models and hyperparameter configurations that affect power. We identify multiple configuration factors that can reduce power consumption during language model training while retaining model quality.

transformer language model language model training transformer language نموذج لغة المحول تدريب نموذج اللغة لغة المحول صناعة حمض الفوسفور المزيد..

Harms of Gender Exclusivity and Challenges in Non-Binary Representation in Language Technologies

211 - Association for Computation Linguistics 2021 مقالة

Gender is widely discussed in the context of language tasks and when examining the stereotypes propagated by language models. However, current discussions primarily treat gender as binary, which can perpetuate harms such as the cyclical erasure of no n-binary gender identities. These harms are driven by model and dataset biases, which are consequences of the non-recognition and lack of understanding of non-binary genders in society. In this paper, we explain the complexity of gender and language around it, and survey non-binary persons to understand harms associated with the treatment of gender as binary in English language technologies. We also detail how current language representations (e.g., GloVe, BERT) capture and perpetuate these harms and related challenges that need to be acknowledged and addressed for representations to equitably encode gender information.

gender exclusivity english language technologies الحصرية بين الجنسين تقنيات اللغة الإنجليزية صناعة حمض الفوسفور

Examining Covert Gender Bias: A Case Study in Turkish and English Machine Translation Models

273 - Association for Computation Linguistics 2021 مقالة

As Machine Translation (MT) has become increasingly more powerful, accessible, and widespread, the potential for the perpetuation of bias has grown alongside its advances. While overt indicators of bias have been studied in machine translation, we ar gue that covert biases expose a problem that is further entrenched. Through the use of the gender-neutral language Turkish and the gendered language English, we examine cases of both overt and covert gender bias in MT models. Specifically, we introduce a method to investigate asymmetrical gender markings. We also assess bias in the attribution of personhood and examine occupational and personality stereotypes through overt bias indicators in MT models. Our work explores a deeper layer of bias in MT models and demonstrates the continued need for language-specific, interdisciplinary methodology in MT model development.

english machine translation ترجمة آلة اللغة الإنجليزية صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Investigating the Impact of Gender Representation in ASR Training Data: a Case Study on Librispeech

التحقيق في تأثير التمثيل الجنساني في بيانات تدريب ASR: دراسة حالة عن Libispeech

Ask ChatGPT about the research

Read More

suggested questions