Research papers, master and doctoral theses about ويكيبيديا

Detection of Puffery on the English Wikipedia

228 - Association for Computation Linguistics 2021 مقالة

On Wikipedia, an online crowdsourced encyclopedia, volunteers enforce the encyclopedia's editorial policies. Wikipedia's policy on maintaining a neutral point of view has inspired recent research on bias detection, including weasel words'' and hedges ''. Yet to date, little work has been done on identifying puffery,'' phrases that are overly positive without a verifiable source. We demonstrate that collecting training data for this task requires some care, and construct a dataset by combining Wikipedia editorial annotations and information retrieval techniques. We compare several approaches to predicting puffery, and achieve 0.963 f1 score by incorporating citation features into a RoBERTa model. Finally, we demonstrate how to integrate our model with Wikipedia's public infrastructure to give back to the Wikipedia editor community.

english wikipedia wikipedia الإنجليزية ويكيبيديا ويكيبيديا إنجليزي صناعة حمض الفوسفور

Investigating the Impact of Gender Representation in ASR Training Data: a Case Study on Librispeech

134 - Association for Computation Linguistics 2021 مقالة

In this paper we question the impact of gender representation in training data on the performance of an end-to-end ASR system. We create an experiment based on the Librispeech corpus and build 3 different training corpora varying only the proportion of data produced by each gender category. We observe that if our system is overall robust to the gender balance or imbalance in training data, it is nonetheless dependant of the adequacy between the individuals present in the training and testing sets.

التحيز في ويكيبيديا asr training data gender representation بيانات التدريب العصر تمثيل الجنس صناعة حمض الفوسفور

LU-BZU at SemEval-2021 Task 2: Word2Vec and Lemma2Vec performance in Arabic Word-in-Context disambiguation

94 - Association for Computation Linguistics 2021 مقالة

This paper presents a set of experiments to evaluate and compare between the performance of using CBOW Word2Vec and Lemma2Vec models for Arabic Word-in-Context (WiC) disambiguation without using sense inventories or sense embeddings. As part of the S emEval-2021 Shared Task 2 on WiC disambiguation, we used the dev.ar-ar dataset (2k sentence pairs) to decide whether two words in a given sentence pair carry the same meaning. We used two Word2Vec models: Wiki-CBOW, a pre-trained model on Arabic Wikipedia, and another model we trained on large Arabic corpora of about 3 billion tokens. Two Lemma2Vec models was also constructed based on the two Word2Vec models. Each of the four models was then used in the WiC disambiguation task, and then evaluated on the SemEval-2021 test.ar-ar dataset. At the end, we reported the performance of different models and compared between using lemma-based and word-based models.

متعدد اللغات واللغة wic disambiguation task arabic wikipedia مهمة Disambiguation WIC. ويكيبيديا العربية صناعة حمض الفوسفور

Assessing Gender Bias in Wikipedia: Inequalities in Article Titles

174 - Association for Computation Linguistics 2021 مقالة

Potential gender biases existing in Wikipedia's content can contribute to biased behaviors in a variety of downstream NLP systems. Yet, efforts in understanding what inequalities in portraying women and men occur in Wikipedia focused so far only on * biographies*, leaving open the question of how often such harmful patterns occur in other topics. In this paper, we investigate gender-related asymmetries in Wikipedia titles from *all domains*. We assess that for only half of gender-related articles, i.e., articles with words such as *women* or *male* in their titles, symmetrical counterparts describing the same concept for the other gender (and clearly stating it in their titles) exist. Among the remaining imbalanced cases, the vast majority of articles concern sports- and social-related issues. We provide insights on how such asymmetries can influence other Wikipedia components and propose steps towards reducing the frequency of observed patterns.

assessing gender bias bias in wikipedia التحيز في ويكيبيديا صناعة حمض الفوسفور

Fool Me Twice: Entailment from Wikipedia Gamification

120 - Association for Computation Linguistics 2021 مقالة

We release FoolMeTwice (FM2 for short), a large dataset of challenging entailment pairs collected through a fun multi-player game. Gamification encourages adversarial examples, drastically lowering the number of examples that can be solved using shor tcuts'' compared to other popular entailment datasets. Players are presented with two tasks. The first task asks the player to write a plausible claim based on the evidence from a Wikipedia page. The second one shows two plausible claims written by other players, one of which is false, and the goal is to identify it before the time runs out. Players pay'' to see clues retrieved from the evidence pool: the more evidence the player needs, the harder the claim. Game-play between motivated players leads to diverse strategies for crafting claims, such as temporal inference and diverting to unrelated evidence, and results in higher quality data for the entailment and evidence retrieval tasks. We open source the dataset and the game code.

wikipedia gamification entailment challenging entailment pairs جامع ويكيبيديا استلامي أزواج الاستلام الصعبة صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد