No Arabic abstract
Social media has been on the vanguard of political information diffusion in the 21st century. Most studies that look into disinformation, political influence and fake-news focus on mainstream social media platforms. This has inevitably made English an important factor in our current understanding of political activity on social media. As a result, there has only been a limited number of studies into a large portion of the world, including the largest, multilingual and multi-cultural democracy: India. In this paper we present our characterisation of a multilingual social network in India called ShareChat. We collect an exhaustive dataset across 72 weeks before and during the Indian general elections of 2019, across 14 languages. We investigate the cross lingual dynamics by clustering visually similar images together, and exploring how they move across language barriers. We find that Telugu, Malayalam, Tamil and Kannada languages tend to be dominant in soliciting political images (often referred to as memes), and posts from Hindi have the largest cross-lingual diffusion across ShareChat (as well as images containing text in English). In the case of images containing text that cross language barriers, we see that language translation is used to widen the accessibility. That said, we find cases where the same image is associated with very different text (and therefore meanings). This initial characterisation paves the way for more advanced pipelines to understand the dynamics of fake and political content in a multi-lingual and non-textual setting.
To reach a broader audience and optimize traffic toward news articles, media outlets commonly run social media accounts and share their content with a short text summary. Despite its importance of writing a compelling message in sharing articles, the research community does not own a sufficient understanding of what kinds of editing strategies effectively promote audience engagement. In this study, we aim to fill the gap by analyzing media outlets current practices using a data-driven approach. We first build a parallel corpus of original news articles and their corresponding tweets that eight media outlets shared. Then, we explore how those media edited tweets against original headlines and the effects of such changes. To estimate the effects of editing news headlines for social media sharing in audience engagement, we present a systematic analysis that incorporates a causal inference technique with deep learning; using propensity score matching, it allows for estimating potential (dis-)advantages of an editing style compared to counterfactual cases where a similar news article is shared with a different style. According to the analyses of various editing styles, we report common and differing effects of the styles across the outlets. To understand the effects of various editing styles, media outlets could apply our easy-to-use tool by themselves.
The large-scale online management systems (e.g. Moodle), online web forums (e.g. Piazza), and online homework systems (e.g. WebAssign) have been widely used in the blended courses recently. Instructors can use these systems to deliver class content and materials. Students can communicate with the classmates, share the course materials, and discuss the course questions via the online forums. With the increased use of the online systems, a large amount of students interaction data has been collected. This data can be used to analyze students learning behaviors and predict students learning outcomes. In this work, we collected students interaction data in three different blended courses. We represented the data as directed graphs and investigated the correlation between the social graph properties and students final grades. Our results showed that in all these classes, students who asked more answers and received more feedbacks on the forum tend to obtain higher grades. The significance of this work is that we can use the results to encourage students to participate more in forums to learn the class materials better; we can also build a predictive model based on the social metrics to show us low performing students early in the semester.
We address the problem of maximizing user engagement with content (in the form of like, reply, retweet, and retweet with comments)on the Twitter platform. We formulate the engagement forecasting task as a multi-label classification problem that captures choice behavior on an unsupervised clustering of tweet-topics. We propose a neural network architecture that incorporates user engagement history and predicts choice conditional on this context. We study the impact of recommend-ing tweets on engagement outcomes by solving an appropriately defined sweet optimization problem based on the proposed model using a large dataset obtained from Twitter.
Between February 14, 2019 and March 4, 2019, a terrorist attack in Pulwama, Kashmir followed by retaliatory airstrikes led to rising tensions between India and Pakistan, two nuclear-armed countries. In this work, we examine polarizing messaging on Twitter during these events, particularly focusing on the positions of Indian and Pakistani politicians. We use a label propagation technique focused on hashtag co-occurrences to find polarizing tweets and users. Our analysis reveals that politicians in the ruling political party in India (BJP) used polarized hashtags and called for escalation of conflict more so than politicians from other parties. Our work offers the first analysis of how escalating tensions between India and Pakistan manifest on Twitter and provides a framework for studying polarizing messages.
Matching problems are ubiquitous. They occur in economic markets, labor markets, internet advertising, and elsewhere. In this paper we focus on an application of matching for social media. Our goal is to distribute content from information suppliers to information consumers. We seek to maximize the overall relevance of the matched content from suppliers to consumers while regulating the overall activity, e.g., ensuring that no consumer is overwhelmed with data and that all suppliers have chances to deliver their content. We propose two matching algorithms, GreedyMR and StackMR, geared for the MapReduce paradigm. Both algorithms have provable approximation guarantees, and in practice they produce high-quality solutions. While both algorithms scale extremely well, we can show that StackMR requires only a poly-logarithmic number of MapReduce steps, making it an attractive option for applications with very large datasets. We experimentally show the trade-offs between quality and efficiency of our solutions on two large datasets coming from real-world social-media web sites.