ترغب بنشر مسار تعليمي؟ اضغط هنا

Tackling Racial Bias in Automated Online Hate Detection: Towards Fair and Accurate Classification of Hateful Online Users Using Geometric Deep Learning

77   0   0.0 ( 0 )
 نشر من قبل Scott A. Hale
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Online hate is a growing concern on many social media platforms and other sites. To combat it, technology companies are increasingly identifying and sanctioning `hateful users rather than simply moderating hateful content. Yet, most research in online hate detection to date has focused on hateful content. This paper examines how fairer and more accurate hateful user detection systems can be developed by incorporating social network information through geometric deep learning. Geometric deep learning dynamically learns information-rich network representations and can generalise to unseen nodes. This is essential for moving beyond manually engineered network features, which lack scalability and produce information-sparse network representations. This paper compares the accuracy of geometric deep learning with other techniques which either exclude network information or incorporate it through manual feature engineering (e.g., node2vec). It also evaluates the fairness of these techniques using the `predictive equality criteria, comparing the false positive rates on a subset of 136 African-American users with 4836 other users. Geometric deep learning produces the most accurate and fairest classifier, with an AUC score of 90.8% on the entire dataset and a false positive rate of zero among the African-American subset for the best performing model. This highlights the benefits of more effectively incorporating social network features in automated hateful user detection. Such an approach is also easily operationalized for real-world content moderation as it has an efficient and scalable design.



قيم البحث

اقرأ أيضاً

Online debates are often characterised by extreme polarisation and heated discussions among users. The presence of hate speech online is becoming increasingly problematic, making necessary the development of appropriate countermeasures. In this work, we perform hate speech detection on a corpus of more than one million comments on YouTube videos through a machine learning model fine-tuned on a large set of hand-annotated data. Our analysis shows that there is no evidence of the presence of serial haters, intended as active users posting exclusively hateful comments. Moreover, coherently with the echo chamber hypothesis, we find that users skewed towards one of the two categories of video channels (questionable, reliable) are more prone to use inappropriate, violent, or hateful language within their opponents community. Interestingly, users loyal to reliable sources use on average a more toxic language than their counterpart. Finally, we find that the overall toxicity of the discussion increases with its length, measured both in terms of number of comments and time. Our results show that, coherently with Godwins law, online debates tend to degenerate towards increasingly toxic exchanges of views.
In current hate speech datasets, there exists a high correlation between annotators perceptions of toxicity and signals of African American English (AAE). This bias in annotated training data and the tendency of machine learning models to amplify it cause AAE text to often be mislabeled as abusive/offensive/hate speech with a high false positive rate by current hate speech classifiers. In this paper, we use adversarial training to mitigate this bias, introducing a hate speech classifier that learns to detect toxic sentences while demoting confounds corresponding to AAE texts. Experimental results on a hate speech dataset and an AAE dataset suggest that our method is able to substantially reduce the false positive rate for AAE text while only minimally affecting the performance of hate speech classification.
Hateful speech in Online Social Networks (OSNs) is a key challenge for companies and governments, as it impacts users and advertisers, and as several countries have strict legislation against the practice. This has motivated work on detecting and cha racterizing the phenomenon in tweets, social media posts and comments. However, these approaches face several shortcomings due to the noisiness of OSN data, the sparsity of the phenomenon, and the subjectivity of the definition of hate speech. This works presents a user-centric view of hate speech, paving the way for better detection methods and understanding. We collect a Twitter dataset of $100,386$ users along with up to $200$ tweets from their timelines with a random-walk-based crawler on the retweet graph, and select a subsample of $4,972$ to be manually annotated as hateful or not through crowdsourcing. We examine the difference between user activity patterns, the content disseminated between hateful and normal users, and network centrality measurements in the sampled graph. Our results show that hateful users have more recent account creation dates, and more statuses, and followees per day. Additionally, they favorite more tweets, tweet in shorter intervals and are more central in the retweet network, contradicting the lone wolf stereotype often associated with such behavior. Hateful users are more negative, more profane, and use less words associated with topics such as hate, terrorism, violence and anger. We also identify similarities between hateful/normal users and their 1-neighborhood, suggesting strong homophily.
During natural or man-made disasters, humanitarian response organizations look for useful information to support their decision-making processes. Social media platforms such as Twitter have been considered as a vital source of useful information for disaster response and management. Despite advances in natural language processing techniques, processing short and informal Twitter messages is a challenging task. In this paper, we propose to use Deep Neural Network (DNN) to address two types of information needs of response organizations: 1) identifying informative tweets and 2) classifying them into topical classes. DNNs use distributed representation of words and learn the representation as well as higher level features automatically for the classification task. We propose a new online algorithm based on stochastic gradient descent to train DNNs in an online fashion during disaster situations. We test our models using a crisis-related real-world Twitter dataset.
Citizen-generated counter speech is a promising way to fight hate speech and promote peaceful, non-polarized discourse. However, there is a lack of large-scale longitudinal studies of its effectiveness for reducing hate speech. To this end, we perfor m an exploratory analysis of the effectiveness of counter speech using several different macro- and micro-level measures to analyze 180,000 political conversations that took place on German Twitter over four years. We report on the dynamic interactions of hate and counter speech over time and provide insights into whether, as in `classic bullying situations, organized efforts are more effective than independent individuals in steering online discourse. Taken together, our results build a multifaceted picture of the dynamics of hate and counter speech online. While we make no causal claims due to the complexity of discourse dynamics, our findings suggest that organized hate speech is associated with changes in public discourse and that counter speech -- especially when organized -- may help curb hateful rhetoric in online discourse.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا