ترغب بنشر مسار تعليمي؟ اضغط هنا

Improving Generalizability of Fake News Detection Methods using Propensity Score Matching

144   0   0.0 ( 0 )
 نشر من قبل Bo Ni
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Recently, due to the booming influence of online social networks, detecting fake news is drawing significant attention from both academic communities and general public. In this paper, we consider the existence of confounding variables in the features of fake news and use Propensity Score Matching (PSM) to select generalizable features in order to reduce the effects of the confounding variables. Experimental results show that the generalizability of fake news method is significantly better by using PSM than using raw frequency to select features. We investigate multiple types of fake news methods (classifiers) such as logistic regression, random forests, and support vector machines. We have consistent observations of performance improvement.

قيم البحث

اقرأ أيضاً

Today social media has become the primary source for news. Via social media platforms, fake news travel at unprecedented speeds, reach global audiences and put users and communities at great risk. Therefore, it is extremely important to detect fake n ews as early as possible. Recently, deep learning based approaches have shown improved performance in fake news detection. However, the training of such models requires a large amount of labeled data, but manual annotation is time-consuming and expensive. Moreover, due to the dynamic nature of news, annotated samples may become outdated quickly and cannot represent the news articles on newly emerged events. Therefore, how to obtain fresh and high-quality labeled samples is the major challenge in employing deep learning models for fake news detection. In order to tackle this challenge, we propose a reinforced weakly-supervised fake news detection framework, i.e., WeFEND, which can leverage users reports as weak supervision to enlarge the amount of training data for fake news detection. The proposed framework consists of three main components: the annotator, the reinforced selector and the fake news detector. The annotator can automatically assign weak labels for unlabeled news based on users reports. The reinforced selector using reinforcement learning techniques chooses high-quality samples from the weakly labeled data and filters out those low-quality ones that may degrade the detectors prediction performance. The fake news detector aims to identify fake news based on the news content. We tested the proposed framework on a large collection of news articles published via WeChat official accounts and associated user reports. Extensive experiments on this dataset show that the proposed WeFEND model achieves the best performance compared with the state-of-the-art methods.
93 - Yi Han , Amila Silva , Ling Luo 2021
Recent years have witnessed the significant damage caused by various types of fake news. Although considerable effort has been applied to address this issue and much progress has been made on detecting fake news, most existing approaches mainly rely on the textual content and/or social context, while knowledge-level information---entities extracted from the news content and the relations between them---is much less explored. Within the limited work on knowledge-based fake news detection, an external knowledge graph is often required, which may introduce additional problems: it is quite common for entities and relations, especially with respect to new concepts, to be missing in existing knowledge graphs, and both entity prediction and link prediction are open research questions themselves. Therefore, in this work, we investigate textbf{knowledge-based fake news detection that does not require any external knowledge graph.} Specifically, our contributions include: (1) transforming the problem of detecting fake news into a subgraph classification task---entities and relations are extracted from each news item to form a single knowledge graph, where a news item is represented by a subgraph. Then a graph neural network (GNN) model is trained to classify each subgraph/news item. (2) Further improving the performance of this model through a simple but effective multi-modal technique that combines extracted knowledge, textual content and social context. Experiments on multiple datasets with thousands of labelled news items demonstrate that our knowledge-based algorithm outperforms existing counterpart methods, and its performance can be further boosted by the multi-modal approach.
With the rapid evolution of social media, fake news has become a significant social problem, which cannot be addressed in a timely manner using manual investigation. This has motivated numerous studies on automating fake news detection. Most studies explore supervised training models with different modalities (e.g., text, images, and propagation networks) of news records to identify fake news. However, the performance of such techniques generally drops if news records are coming from different domains (e.g., politics, entertainment), especially for domains that are unseen or rarely-seen during training. As motivation, we empirically show that news records from different domains have significantly different word usage and propagation patterns. Furthermore, due to the sheer volume of unlabelled news records, it is challenging to select news records for manual labelling so that the domain-coverage of the labelled dataset is maximized. Hence, this work: (1) proposes a novel framework that jointly preserves domain-specific and cross-domain knowledge in news records to detect fake news from different domains; and (2) introduces an unsupervised technique to select a set of unlabelled informative news records for manual labelling, which can be ultimately used to train a fake news detection model that performs well for many domains while minimizing the labelling cost. Our experiments show that the integration of the proposed fake news model and the selective annotation approach achieves state-of-the-art performance for cross-domain news datasets, while yielding notable improvements for rarely-appearing domains in news datasets.
94 - Hao Liao , Qixin Liu , Kai Shu 2020
Disinformation has long been regarded as a severe social problem, where fake news is one of the most representative issues. What is worse, todays highly developed social media makes fake news widely spread at incredible speed, bringing in substantial harm to various aspects of human life. Yet, the popularity of social media also provides opportunities to better detect fake news. Unlike conventional means which merely focus on either content or user comments, effective collaboration of heterogeneous social media information, including content and context factors of news, users comments and the engagement of social media with users, will hopefully give rise to better detection of fake news. Motivated by the above observations, a novel detection framework, namely graph comment-user advanced learning framework (GCAL) is proposed in this paper. User-comment information is crucial but not well studied in fake news detection. Thus, we model user-comment context through network representation learning based on heterogeneous graph neural network. We conduct experiments on two real-world datasets, which demonstrate that the proposed joint model outperforms 8 state-of-the-art baseline methods for fake news detection (at least 4% in Accuracy, 7% in Recall and 5% in F1). Moreover, the proposed method is also explainable.
In early January 2020, after China reported the first cases of the new coronavirus (SARS-CoV-2) in the city of Wuhan, unreliable and not fully accurate information has started spreading faster than the virus itself. Alongside this pandemic, people ha ve experienced a parallel infodemic, i.e., an overabundance of information, some of which misleading or even harmful, that has widely spread around the globe. Although Social Media are increasingly being used as information source, Web Search Engines, like Google or Yahoo!, still represent a powerful and trustworthy resource for finding information on the Web. This is due to their capability to capture the largest amount of information, helping users quickly identify the most relevant, useful, although not always the most reliable, results for their search queries. This study aims to detect potential misleading and fake contents by capturing and analysing textual information, which flow through Search Engines. By using a real-world dataset associated with recent CoViD-19 pandemic, we first apply re-sampling techniques for class imbalance, then we use existing Machine Learning algorithms for classification of not reliable news. By extracting lexical and host-based features of associated Uniform Resource Locators (URLs) for news articles, we show that the proposed methods, so common in phishing and malicious URLs detection, can improve the efficiency and performance of classifiers. Based on these findings, we suggest that the use of both textual and URLs features can improve the effectiveness of fake news detection methods.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا