أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Caleb Ziems

To Protect and To Serve? Analyzing Entity-Centric Framing of Police Violence

85 - Caleb Ziems , Diyi Yang 2021

Framing has significant but subtle effects on public opinion and policy. We propose an NLP framework to measure entity-centric frames. We use it to understand media coverage on police violence in the United States in a new Police Violence Frames Corp us of 82k news articles spanning 7k police killings. Our work uncovers more than a dozen framing devices and reveals significant differences in the way liberal and conservative news sources frame both the issue of police violence and the entities involved. Conservative sources emphasize when the victim is armed or attacking an officer and are more likely to mention the victims criminal record. Liberal sources focus more on the underlying systemic injustice, highlighting the victims race and that they were unarmed. We discover temporary spikes in these injustice frames near high-profile shooting events, and finally, we show protest volume correlates with and precedes media framing decisions.

الحساب واللغة

Latent Hatred: A Benchmark for Understanding Implicit Hate Speech

104 - Mai ElSherief , Caleb Ziems , David Muchlinski 2021

Hate speech has grown significantly on social media, causing serious consequences for victims of all demographics. Despite much attention being paid to characterize and detect discriminatory speech, most work has focused on explicit or overt hate spe ech, failing to address a more pervasive form based on coded or indirect language. To fill this gap, this work introduces a theoretically-justified taxonomy of implicit hate speech and a benchmark corpus with fine-grained labels for each message and its implication. We present systematic analyses of our dataset using contemporary baselines to detect and explain implicit hate speech, and we discuss key features that challenge existing models. This dataset will continue to serve as a useful benchmark for understanding this multifaceted issue.

الحساب واللغة الشبكات الاجتماعية والمعلومات

Quantifying the Impact of Human Capital, Job History, and Language Factors on Job Seniority with a Large-scale Analysis of Resumes

148 - Austin P Wright , Caleb Ziems , Haekyu Park 2021

As job markets worldwide have become more competitive and applicant selection criteria have become more opaque, and different (and sometimes contradictory) information and advice is available for job seekers wishing to progress in their careers, it h as never been more difficult to determine which factors in a resume most effectively help career progression. In this work we present a novel, large scale dataset of over half a million resumes with preliminary analysis to begin to answer empirically which factors help or hurt people wishing to transition to more senior roles as they progress in their career. We find that previous experience forms the most important factor, outweighing other aspects of human capital, and find which language factors in a resume have significant effects. This lays the groundwork for future inquiry in career trajectories using large scale data analysis and natural language processing techniques.

الاقتصاد العام استرجاع المعلومات اقتصاديات

Racism is a Virus: Anti-Asian Hate and Counterhate in Social Media during the COVID-19 Crisis

206 - Caleb Ziems , Bing He , Sandeep Soni 2020

The spread of COVID-19 has sparked racism, hate, and xenophobia in social media targeted at Chinese and broader Asian communities. However, little is known about how racial hate spreads during a pandemic and the role of counterhate speech in mitigati ng the spread. Here we study the evolution and spread of anti-Asian hate speech through the lens of Twitter. We create COVID-HATE, the largest dataset of anti-Asian hate and counterhate spanning three months, containing over 30 million tweets, and a social network with over 87 million nodes. By creating a novel hand-labeled dataset of 2,400 tweets, we train a text classifier to identify hate and counterhate tweets that achieves an average AUROC of 0.852. We identify 891,204 hate and 200,198 counterhate tweets in COVID-HATE. Using this data to conduct longitudinal analysis, we find that while hateful users are less engaged in the COVID-19 discussions prior to their first anti-Asian tweet, they become more vocal and engaged afterwards compared to counterhate users. We find that bots comprise 10.4% of hateful users and are more vocal and hateful compared to non-bot users. Comparing bot accounts, we show that hateful bots are more successful in attracting followers compared to counterhate bots. Analysis of the social network reveals that hateful and counterhate users interact and engage extensively with one another, instead of living in isolated polarized communities. Furthermore, we find that hate is contagious and nodes are highly likely to become hateful after being exposed to hateful content. Importantly, our analysis reveals that counterhate messages can discourage users from turning hateful in the first place. Overall, this work presents a comprehensive overview of anti-Asian hate and counterhate content during a pandemic. The COVID-HATE dataset is available at http://claws.cc.gatech.edu/covid.

الشبكات الاجتماعية والمعلومات الحساب واللغة أجهزة الكمبيوتر والمجتمع

Aggressive, Repetitive, Intentional, Visible, and Imbalanced: Refining Representations for Cyberbullying Classification

56 - Caleb Ziems , Ymir Vigfusson , Fred Morstatter 2020

Cyberbullying is a pervasive problem in online communities. To identify cyberbullying cases in large-scale social networks, content moderators depend on machine learning classifiers for automatic cyberbullying detection. However, existing models rema in unfit for real-world applications, largely due to a shortage of publicly available training data and a lack of standard criteria for assigning ground truth labels. In this study, we address the need for reliable data using an original annotation framework. Inspired by social sciences research into bullying behavior, we characterize the nuanced problem of cyberbullying using five explicit factors to represent its social and linguistic aspects. We model this behavior using social network and language-based features, which improve classifier performance. These results demonstrate the importance of representing and modeling cyberbullying as a social phenomenon.

الشبكات الاجتماعية والمعلومات الحساب واللغة

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد