Pseudo-labelling Enhanced Media Bias Detection

73 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Qin Ruan

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Qin Ruan - Brian Mac Namee - Ruihai Dong

الحساب واللغة استرجاع المعلومات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Leveraging unlabelled data through weak or distant supervision is a compelling approach to developing more effective text classification models. This paper proposes a simple but effective data augmentation method, which leverages the idea of pseudo-labelling to select samples from noisy distant supervision annotation datasets. The result shows that the proposed method improves the accuracy of biased news detection models.

قيم البحث

119 - Ramy Baly , Georgi Karadzhov , Jisun An 2020

Predicting the political bias and the factuality of reporting of entire news outlets are critical elements of media profiling, which is an understudied but an increasingly important research direction. The present level of proliferation of fake, bias ed, and propagandistic content online, has made it impossible to fact-check every single suspicious claim, either manually or automatically. Alternatively, we can profile entire news outlets and look for those that are likely to publish fake or biased content. This approach makes it possible to detect likely fake news the moment they are published, by simply checking the reliability of their source. From a practical perspective, political bias and factuality of reporting have a linguistic aspect but also a social context. Here, we study the impact of both, namely (i) what was written (i.e., what was published by the target medium, and how it describes itself on Twitter) vs. (ii) who read it (i.e., analyzing the readers of the target medium on Facebook, Twitter, and YouTube). We further study (iii) what was written about the target medium on Wikipedia. The evaluation results show that what was written matters most, and that putting all information sources together yields huge improvements over the current state-of-the-art.

الحساب واللغة استرجاع المعلومات التعلم الآلي

Knowledge Graph Enhanced Event Extraction in Financial Documents

153 - Kaihao Guo , Tianpei Jiang , Haipeng Zhang 2021

Event extraction is a classic task in natural language processing with wide use in handling large amount of yet rapidly growing financial, legal, medical, and government documents which often contain multiple events with their elements scattered and mixed across the documents, making the problem much more difficult. Though the underlying relations between event elements to be extracted provide helpful contextual information, they are somehow overlooked in prior studies. We showcase the enhancement to this task brought by utilizing the knowledge graph that captures entity relations and their attributes. We propose a first event extraction framework that embeds a knowledge graph through a Graph Neural Network and integrates the embedding with regular features, all at document-level. Specifically, for extracting events from Chinese financial announcements, our method outperforms the state-of-the-art method by 5.3% in F1-score.

الحساب واللغة استرجاع المعلومات التعلم الآلي

Predicting the Factuality of Reporting of News Media Using Observations About User Attention in Their YouTube Channels

108 - Krasimira Bozhanova , Yoan Dinkov , Ivan Koychev 2021

We propose a novel framework for predicting the factuality of reporting of news media outlets by studying the user attention cycles in their YouTube channels. In particular, we design a rich set of features derived from the temporal evolution of the number of views, likes, dislikes, and comments for a video, which we then aggregate to the channel level. We develop and release a dataset for the task, containing observations of user attention on YouTube channels for 489 news media. Our experiments demonstrate both complementarity and sizable improvements over state-of-the-art textual representations.

الحساب واللغة استرجاع المعلومات التعلم الآلي

LOGAN: Local Group Bias Detection by Clustering

67 - Jieyu Zhao , Kai-Wei Chang 2020

Machine learning techniques have been widely used in natural language processing (NLP). However, as revealed by many recent studies, machine learning models often inherit and amplify the societal biases in data. Various metrics have been proposed to quantify biases in model predictions. In particular, several of them evaluate disparity in model performance between protected groups and advantaged groups in the test corpus. However, we argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model. In fact, a model with similar aggregated performance between different groups on the entire data may behave differently on instances in a local region. To analyze and detect such local bias, we propose LOGAN, a new bias detection technique based on clustering. Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region and allows us to better analyze the biases in model predictions.

الحساب واللغة الذكاء الاصطناعي التعلم الآلي

Detecting Media Bias in News Articles using Gaussian Bias Distributions

190 - Wei-Fan Chen , Khalid Al-Khatib , Benno Stein 2020

Media plays an important role in shaping public opinion. Biased media can influence people in undesirable directions and hence should be unmasked as such. We observe that featurebased and neural text classification approaches which rely only on the d istribution of low-level lexical information fail to detect media bias. This weakness becomes most noticeable for articles on new events, where words appear in new contexts and hence their bias predictiveness is unclear. In this paper, we therefore study how second-order information about biased statements in an article helps to improve detection effectiveness. In particular, we utilize the probability distributions of the frequency, positions, and sequential order of lexical and informational sentence-level bias in a Gaussian Mixture Model. On an existing media bias dataset, we find that the frequency and positions of biased statements strongly impact article-level bias, whereas their exact sequential order is secondary. Using a standard model for sentence-level bias detection, we provide empirical evidence that article-level bias detectors that use second-order information clearly outperform those without.

الحساب واللغة