يناقش الجنس على نطاق واسع في سياق المهام اللغوية وعند فحص الصور النمطية الناتجة عن نماذج اللغة.ومع ذلك، تعامل المناقشات الحالية في المقام الأول بين الجنسين باعتبارها ثنائية، والتي يمكن أن تديم الأضرار مثل المحور الدوري للهويات الجنسية غير الثنائية.هذه الأضرار مدفوعة بالتحيزات النموذجية ومجموعات البيانات، والتي هي عواقب عدم الاعتراف بعدم الاعتراف بعدم الاعتراف بعدم الاعتراف بالعقاب غير الثنائية في المجتمع.في هذه الورقة، نوضح تعقيد الجنس واللغة حولها، ومسح الأشخاص غير الثنائيين لفهم الأضرار المرتبطة بمعاملة الجنس باعتبارها ثنائية في تكنولوجيات اللغة الإنجليزية.كما نقوم بالتفصيل كيف تمثيل اللغات الحالية (على سبيل المثال، قفاز، بيرت)، وإدامة هؤلاء الأضرار والتحديات ذات الصلة التي يجب الاعتراف بها ومعالجتها للتمثيلات بتشفير المعلومات الجنسانية بشكل قاطع.
Gender is widely discussed in the context of language tasks and when examining the stereotypes propagated by language models. However, current discussions primarily treat gender as binary, which can perpetuate harms such as the cyclical erasure of non-binary gender identities. These harms are driven by model and dataset biases, which are consequences of the non-recognition and lack of understanding of non-binary genders in society. In this paper, we explain the complexity of gender and language around it, and survey non-binary persons to understand harms associated with the treatment of gender as binary in English language technologies. We also detail how current language representations (e.g., GloVe, BERT) capture and perpetuate these harms and related challenges that need to be acknowledged and addressed for representations to equitably encode gender information.
References used
https://aclanthology.org/
We propose Visual News Captioner, an entity-aware model for the task of news image captioning. We also introduce Visual News, a large-scale benchmark consisting of more than one million news images along with associated news articles, image captions,
In this paper we question the impact of gender representation in training data on the performance of an end-to-end ASR system. We create an experiment based on the Librispeech corpus and build 3 different training corpora varying only the proportion
Natural Language Processing (NLP) systems are at the heart of many critical automated decision-making systems making crucial recommendations about our future world. Gender bias in NLP has been well studied in English, but has been less studied in oth
This paper presents tools and data sources collected and released by the EMBEDDIA project, supported by the European Union's Horizon 2020 research and innovation program. The collected resources were offered to participants of a hackathon organized a
Using topic modeling and lexicon-based word similarity, we find that stories generated by GPT-3 exhibit many known gender stereotypes. Generated stories depict different topics and descriptions depending on GPT-3's perceived gender of the character i