في هذه الورقة نسأل عن تأثير التمثيل الجنساني في البيانات التدريبية على أداء نظام ASR المنتهي للنهاية.نقوم بإنشاء تجربة بناء على Corpus Librispeech و Build 3 تدريبات تدريبية مختلفة تختلف فقط نسبة البيانات التي تنتجها كل فئة جنسانية.نلاحظ أنه إذا كان نظامنا قوي بشكل عام على التوازن بين الجنسين أو عدم التوازن في البيانات التدريبية، إلا أنه يعتمد على الكفاية بين الأفراد الموجودين في مجموعات التدريب والاختبار.
In this paper we question the impact of gender representation in training data on the performance of an end-to-end ASR system. We create an experiment based on the Librispeech corpus and build 3 different training corpora varying only the proportion of data produced by each gender category. We observe that if our system is overall robust to the gender balance or imbalance in training data, it is nonetheless dependant of the adequacy between the individuals present in the training and testing sets.
References used
https://aclanthology.org/
Human knowledge is collectively encoded in the roughly 6500 languages spoken around the world, but it is not distributed equally across languages. Hence, for information-seeking question answering (QA) systems to adequately serve speakers of all lang
Nowadays, social media platforms use classification models to cope with hate speech and abusive language. The problem of these models is their vulnerability to bias. A prevalent form of bias in hate speech and abusive language datasets is annotator b
Training large language models can consume a large amount of energy. We hypothesize that the language model's configuration impacts its energy consumption, and that there is room for power consumption optimisation in modern large language models. To
Gender is widely discussed in the context of language tasks and when examining the stereotypes propagated by language models. However, current discussions primarily treat gender as binary, which can perpetuate harms such as the cyclical erasure of no
As Machine Translation (MT) has become increasingly more powerful, accessible, and widespread, the potential for the perpetuation of bias has grown alongside its advances. While overt indicators of bias have been studied in machine translation, we ar