من المعروف أن تمثيلات اللغة تحمل تحيزات نمطية ونتيجة لذلك، تؤدي إلى تنبؤات متحيزة في مهام المصب.في حين أن الطرق الحالية فعالة في التحيزات المخفئ عن طريق الإسقاط الخطي، فإن هذه الأساليب عدوانية للغاية: لا تزيل التحيز فقط، ولكن أيضا محو المعلومات القيمة من Word Adgeddings.نقوم بتطوير تدابير جديدة لتقييم الاحتفاظ بالمعلومات المحددة التي توضح مفاضلة بين إزالة التحيز والاحتفاظ بالمعلومات.لمعالجة هذا التحدي، نقترح أوسكار (تصحيح الفضاء الفرعي المتعامد والتصحيح)، وهي طريقة تخفيف التحيز التي تركز على تحطيم الجمعيات المتحيزة بين المفاهيم بدلا من إزالة المفاهيم بالجملة.تشير تجاربنا في التحيزات بين الجنسين إلى أن أوسكار هو نهج متوازن جيدا يضمن أن يتم الاحتفاظ بالمعلومات الدلالية في المدينات والتحيز بشكل فعال.
Language representations are known to carry stereotypical biases and, as a result, lead to biased predictions in downstream tasks. While existing methods are effective at mitigating biases by linear projection, such methods are too aggressive: they not only remove bias, but also erase valuable information from word embeddings. We develop new measures for evaluating specific information retention that demonstrate the tradeoff between bias removal and information retention. To address this challenge, we propose OSCaR (Orthogonal Subspace Correction and Rectification), a bias-mitigating method that focuses on disentangling biased associations between concepts instead of removing concepts wholesale. Our experiments on gender biases show that OSCaR is a well-balanced approach that ensures that semantic information is retained in the embeddings and bias is also effectively mitigated.
References used
https://aclanthology.org/
Word embeddings are widely used in Natural Language Processing (NLP) for a vast range of applications. However, it has been consistently proven that these embeddings reflect the same human biases that exist in the data used to train them. Most of the
Word Embedding maps words to vectors of real numbers. It is derived from a large corpus and is known to capture semantic knowledge from the corpus. Word Embedding is a critical component of many state-of-the-art Deep Learning techniques. However, gen
We present Query2Prod2Vec, a model that grounds lexical representations for product search in product embeddings: in our model, meaning is a mapping between words and a latent space of products in a digital shop. We leverage shopping sessions to lear
Identifying intertextual relationships between authors is of central importance to the study of literature. We report an empirical analysis of intertextuality in classical Latin literature using word embedding models. To enable quantitative evaluatio
We introduce a new approach for smoothing and improving the quality of word embeddings. We consider a method of fusing word embeddings that were trained on the same corpus but with different initializations. We project all the models to a shared vect