Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Measuring Biases of Word Embeddings: What Similarity Measures and Descriptive Statistics to Use?

قياس تحيزات Word Embeddings: ما تدابير التشابه والإحصاءات الوصفية لاستخدامها؟

744 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

متطلبات المراجعة كلمة embeddings. كلمة صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Word embeddings are widely used in Natural Language Processing (NLP) for a vast range of applications. However, it has been consistently proven that these embeddings reflect the same human biases that exist in the data used to train them. Most of the introduced bias indicators to reveal word embeddings' bias are average-based indicators based on the cosine similarity measure. In this study, we examine the impacts of different similarity measures as well as other descriptive techniques than averaging in measuring the biases of contextual and non-contextual word embeddings. We show that the extent of revealed biases in word embeddings depends on the descriptive statistics and similarity measures used to measure the bias. We found that over the ten categories of word embedding association tests, Mahalanobis distance reveals the smallest bias, and Euclidean distance reveals the largest bias in word embeddings. In addition, the contextual models reveal less severe biases than the non-contextual word embedding models.

References used

https://aclanthology.org/

rate research

OSCaR: Orthogonal Subspace Correction and Rectification of Biases in Word Embeddings

643 - Association for Computation Linguistics 2021 مقالة

Language representations are known to carry stereotypical biases and, as a result, lead to biased predictions in downstream tasks. While existing methods are effective at mitigating biases by linear projection, such methods are too aggressive: they n ot only remove bias, but also erase valuable information from word embeddings. We develop new measures for evaluating specific information retention that demonstrate the tradeoff between bias removal and information retention. To address this challenge, we propose OSCaR (Orthogonal Subspace Correction and Rectification), a bias-mitigating method that focuses on disentangling biased associations between concepts instead of removing concepts wholesale. Our experiments on gender biases show that OSCaR is a well-balanced approach that ensures that semantic information is retained in the embeddings and bias is also effectively mitigated.

orthogonal subspace correction orthogonal subspace subspace correction تصحيح الفضاء الفرعي المتعامد الفضاء الفرعي المتعامد تصحيح الفرعية صناعة حمض الفوسفور المزيد..

Word Embeddings, Cosine Similarity and Deep Learning for Identification of Professions \& Occupations in Health-related Social Media

806 - Association for Computation Linguistics 2021 مقالة

ProfNER-ST focuses on the recognition of professions and occupations from Twitter using Spanish data. Our participation is based on a combination of word-level embeddings, including pre-trained Spanish BERT, as well as cosine similarity computed over a subset of entities that serve as input for an encoder-decoder architecture with attention mechanism. Finally, our best score achieved an F1-measure of 0.823 in the official test set.

health-related social media وسائل الإعلام الاجتماعية ذات الصلة بالصحة صناعة حمض الفوسفور

867 - Association for Computation Linguistics 2021 مقالة

For many NLP applications of online reviews, comparison of two opinion-bearing sentences is key. We argue that, while general purpose text similarity metrics have been applied for this purpose, there has been limited exploration of their applicabilit y to opinion texts. We address this gap in the literature, studying: (1) how humans judge the similarity of pairs of opinion-bearing sentences; and, (2) the degree to which existing text similarity metrics, particularly embedding-based ones, correspond to human judgments. We crowdsourced annotations for opinion sentence pairs and our main findings are: (1) annotators tend to agree on whether or not opinion sentences are similar or different; and (2) embedding-based metrics capture human judgments of opinion similarity'' but not opinion difference''. Based on our analysis, we identify areas where the current metrics should be improved. We further propose to learn a similarity metric for opinion similarity via fine-tuning the Sentence-BERT sentence-embedding network based on review text and weak supervision by review ratings. Experiments show that our learned metric outperforms existing text similarity metrics and especially show significantly higher correlations with human annotations for differing opinions.

opinion-bearing sentences opinion-bearing الجمل تحمل الرأي تشابه الرأي المحامل صناعة حمض الفوسفور

Benchmarking Meta-embeddings: What Works and What Does Not

769 - Association for Computation Linguistics 2021 مقالة

In the last few years, several methods have been proposed to build meta-embeddings. The general aim was to obtain new representations integrating complementary knowledge from different source pre-trained embeddings thereby improving their overall qua lity. However, previous meta-embeddings have been evaluated using a variety of methods and datasets, which makes it difficult to draw meaningful conclusions regarding the merits of each approach. In this paper we propose a unified common framework, including both intrinsic and extrinsic tasks, for a fair and objective meta-embeddings evaluation. Furthermore, we present a new method to generate meta-embeddings, outperforming previous work on a large number of intrinsic evaluation benchmarks. Our evaluation framework also allows us to conclude that previous extrinsic evaluations of meta-embeddings have been overestimated.

meta-embeddings benchmarking meta-embeddings Meta-Embeddings. معايير تايتا - Embeddings صناعة حمض الفوسفور

NPVec1: Word Embeddings for Nepali - Construction and Evaluation

632 - Association for Computation Linguistics 2021 مقالة

Word Embedding maps words to vectors of real numbers. It is derived from a large corpus and is known to capture semantic knowledge from the corpus. Word Embedding is a critical component of many state-of-the-art Deep Learning techniques. However, gen erating good Word Embeddings is a special challenge for low-resource languages such as Nepali due to the unavailability of large text corpus. In this paper, we present NPVec1 which consists of 25 state-of-art Word Embeddings for Nepali that we have derived from a large corpus using Glove, Word2Vec, FastText, and BERT. We further provide intrinsic and extrinsic evaluations of these Embeddings using well established metrics and methods. These models are trained using 279 million word tokens and are the largest Embeddings ever trained for Nepali language. Furthermore, we have made these Embeddings publicly available to accelerate the development of Natural Language Processing (NLP) applications in Nepali.

ينطبق على تقطير المعرفة embeddings embeddings. صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Measuring Biases of Word Embeddings: What Similarity Measures and Descriptive Statistics to Use?

قياس تحيزات Word Embeddings: ما تدابير التشابه والإحصاءات الوصفية لاستخدامها؟

Ask ChatGPT about the research

Read More

suggested questions