على ويكيبيديا، موسوعة من الجماعة الجماعية عبر الإنترنت، ينفذ المتطوعون سياسات التحرير الموسوعة.لقد ألهمت سياسة ويكيبيديا بشأن الحفاظ على وجهة نظر محايدة البحوث الحديثة على اكتشاف التحيز، بما في ذلك كلمات الاصوات "والتحريز".بعد حتى الآن، تم إجراء القليل من العمل على تحديد البخاخ، "العبارات الموجودة بشكل مفرط دون مصدر يمكن التحقق منه.نوضح أن جمع البيانات التدريبية لهذه المهمة يتطلب بعض العناية، وبناء مجموعة بيانات عن طريق الجمع بين التعليقات التوضيحية لتحرير ويكيبيديا وتقنيات استرجاع المعلومات.نقارن العديد من النهج التي توقعت من البخار وتحقيق 0.963 F1 من خلال دمج ميزات الاقتباس في نموذج روبرتا.أخيرا، نوضح كيفية دمج نموذجنا مع البنية التحتية العامة في ويكيبيديا لإعادة مجتمع محرر ويكيبيديا.
On Wikipedia, an online crowdsourced encyclopedia, volunteers enforce the encyclopedia's editorial policies. Wikipedia's policy on maintaining a neutral point of view has inspired recent research on bias detection, including weasel words'' and hedges''. Yet to date, little work has been done on identifying puffery,'' phrases that are overly positive without a verifiable source. We demonstrate that collecting training data for this task requires some care, and construct a dataset by combining Wikipedia editorial annotations and information retrieval techniques. We compare several approaches to predicting puffery, and achieve 0.963 f1 score by incorporating citation features into a RoBERTa model. Finally, we demonstrate how to integrate our model with Wikipedia's public infrastructure to give back to the Wikipedia editor community.
References used
The advancement of the web and information technology has contributed to the rapid growth of digital libraries and automatic machine translation tools which easily translate texts from one language into another. These have increased the content acces
We introduce HateBERT, a re-trained BERT model for abusive language detection in English. The model was trained on RAL-E, a large-scale dataset of Reddit comments in English from communities banned for being offensive, abusive, or hateful that we hav
Cross-document event coreference resolution is a foundational task for NLP applications involving multi-text processing. However, existing corpora for this task are scarce and relatively small, while annotating only modest-size clusters of documents
There has been increasing demand to develop effective computer-assisted language training (CAPT) systems, which can provide feedback on mispronunciations and facilitate second-language (L2) learners to improve their speaking proficiency through repea
Nowadays social-psychological variables , like attitudes and motivation, gender, aptitude, etc. have been established as influential factors in the process of learning a foreign language . Therefore, this research aims at measuring the attitudes of f