No Arabic abstract
As computer-generated content and deepfakes make steady improvements, semantic approaches to multimedia forensics will become more important. In this paper, we introduce a novel classification architecture for identifying semantic inconsistencies between video appearance and text caption in social media news posts. We develop a multi-modal fusion framework to identify mismatches between videos and captions in social media posts by leveraging an ensemble method based on textual analysis of the caption, automatic audio transcription, semantic video analysis, object detection, named entity consistency, and facial verification. To train and test our approach, we curate a new video-based dataset of 4,000 real-world Facebook news posts for analysis. Our multi-modal approach achieves 60.5% classification accuracy on random mismatches between caption and appearance, compared to accuracy below 50% for uni-modal models. Further ablation studies confirm the necessity of fusion across modalities for correctly identifying semantic inconsistencies.
Recent years have witnessed the significant damage caused by various types of fake news. Although considerable effort has been applied to address this issue and much progress has been made on detecting fake news, most existing approaches mainly rely on the textual content and/or social context, while knowledge-level information---entities extracted from the news content and the relations between them---is much less explored. Within the limited work on knowledge-based fake news detection, an external knowledge graph is often required, which may introduce additional problems: it is quite common for entities and relations, especially with respect to new concepts, to be missing in existing knowledge graphs, and both entity prediction and link prediction are open research questions themselves. Therefore, in this work, we investigate textbf{knowledge-based fake news detection that does not require any external knowledge graph.} Specifically, our contributions include: (1) transforming the problem of detecting fake news into a subgraph classification task---entities and relations are extracted from each news item to form a single knowledge graph, where a news item is represented by a subgraph. Then a graph neural network (GNN) model is trained to classify each subgraph/news item. (2) Further improving the performance of this model through a simple but effective multi-modal technique that combines extracted knowledge, textual content and social context. Experiments on multiple datasets with thousands of labelled news items demonstrate that our knowledge-based algorithm outperforms existing counterpart methods, and its performance can be further boosted by the multi-modal approach.
Incorporating multi-modal contexts in conversation is an important step for developing more engaging dialogue systems. In this work, we explore this direction by introducing MMChat: a large scale multi-modal dialogue corpus (32.4M raw dialogues and 120.84K filtered dialogues). Unlike previous corpora that are crowd-sourced or collected from fictitious movies, MMChat contains image-grounded dialogues collected from real conversations on social media, in which the sparsity issue is observed. Specifically, image-initiated dialogues in common communications may deviate to some non-image-grounded topics as the conversation proceeds. We develop a benchmark model to address this issue in dialogue generation tasks by adapting the attention routing mechanism on image features. Experiments demonstrate the usefulness of incorporating image features and the effectiveness in handling the sparsity of image features.
Effective detection of fake news has recently attracted significant attention. Current studies have made significant contributions to predicting fake news with less focus on exploiting the relationship (similarity) between the textual and visual information in news articles. Attaching importance to such similarity helps identify fake news stories that, for example, attempt to use irrelevant images to attract readers attention. In this work, we propose a $mathsf{S}$imilarity-$mathsf{A}$ware $mathsf{F}$ak$mathsf{E}$ news detection method ($mathsf{SAFE}$) which investigates multi-modal (textual and visual) information of news articles. First, neural networks are adopted to separately extract textual and visual features for news representation. We further investigate the relationship between the extracted features across modalities. Such representations of news textual and visual information along with their relationship are jointly learned and used to predict fake news. The proposed method facilitates recognizing the falsity of news articles based on their text, images, or their mismatches. We conduct extensive experiments on large-scale real-world data, which demonstrate the effectiveness of the proposed method.
Large-scale dissemination of disinformation online intended to mislead or deceive the general population is a major societal problem. Rapid progression in image, video, and natural language generative models has only exacerbated this situation and intensified our need for an effective defense mechanism. While existing approaches have been proposed to defend against neural fake news, they are generally constrained to the very limited setting where articles only have text and metadata such as the title and authors. In this paper, we introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions. To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles as well as conduct a series of human user study experiments based on this dataset. In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies, which will serve as an effective first line of defense and a useful reference for future work in defending against machine-generated disinformation.
Social media is currently one of the most important means of news communication. Since people are consuming a large fraction of their daily news through social media, most of the traditional news channels are using social media to catch the attention of users. Each news channel has its own strategies to attract more users. In this paper, we analyze how the news channels use sentiment to garner users attention in social media. We compare the sentiment of social media news posts of television, radio and print media, to show the differences in the ways these channels cover the news. We also analyze users reactions and opinion sentiment on news posts with different sentiments. We perform our experiments on a dataset extracted from Facebook Pages of five popular news channels. Our dataset contains 0.15 million news posts and 1.13 billion users reactions. The results of our experiments show that the sentiment of user opinion has a strong correlation with the sentiment of the news post and the type of information source. Our study also illustrates the differences among the social media news channels of different types of news sources.