حققت الطرز المستندة إلى المحولات مثل Bert و Xlnet و XLM-R أداء أحدث في مختلف مهام NLP بما في ذلك تحديد اللغة الهجومية وخطاب الكراهية، وهي مشكلة مهمة في وسائل التواصل الاجتماعي.في هذه الورقة، نقدم Fbert، إعادة تدريب نموذج BERT على الصلبة، أكبر كوربوس لتحديد اللغة الإنجليزية الهجومية المتاحة مع أكثر من 1.4 مليون حالة هجومية.نقيم أداء Fbert الخاص بتحديد المحتوى الهجومي على مجموعات بيانات باللغة الإنجليزية المتعددة ونختبر عدة عتبات لاختيار المثيلات من الصلبة.سيتم توفير نموذج FberT بحرية للمجتمع.
Transformer-based models such as BERT, XLNET, and XLM-R have achieved state-of-the-art performance across various NLP tasks including the identification of offensive language and hate speech, an important problem in social media. In this paper, we present fBERT, a BERT model retrained on SOLID, the largest English offensive language identification corpus available with over 1.4 million offensive instances. We evaluate fBERT's performance on identifying offensive content on multiple English datasets and we test several thresholds for selecting instances from SOLID. The fBERT model will be made freely available to the community.
References used
https://aclanthology.org/
In recent years, the widespread use of social media has led to an increase in the generation of toxic and offensive content on online platforms. In response, social media platforms have worked on developing automatic detection methods and employing h
In this work, we analyze the performance and properties of cross-lingual word embedding models created by mapping-based alignment methods. We use several measures of corpus and embedding similarity to predict BLI scores of cross-lingual embedding map
In this paper, we propose a globally normalized model for context-free grammar (CFG)-based semantic parsing. Instead of predicting a probability, our model predicts a real-valued score at each step and does not suffer from the label bias problem. Exp
Transformer has achieved great success in the NLP field by composing various advanced models like BERT and GPT. However, Transformer and its existing variants may not be optimal in capturing token distances because the position or distance embeddings
This paper aims to identify key characteristics projects in
reconstruction stage in order to assist decision makers to produce
appropriate approach to manage those projects effectively. A list of
characteristics that may exist in reconstruction projects were
identified through intensive literature review and pilot study with
various stakeholders involved in in reconstruction stage.