Political Depolarization of News Articles Using Attribute-aware Word Embeddings

163 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Soroush Vosoughi Dr

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Ruibo Liu - Lili Wang - Chenyan Jia

الحساب واللغة الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Political polarization in the US is on the rise. This polarization negatively affects the public sphere by contributing to the creation of ideological echo chambers. In this paper, we focus on addressing one of the factors that contributes to this polarity, polarized media. We introduce a framework for depolarizing news articles. Given an article on a certain topic with a particular ideological slant (eg., liberal or conservative), the framework first detects polar language in the article and then generates a new article with the polar language replaced with neutral expressions. To detect polar words, we train a multi-attribute-aware word embedding model that is aware of ideology and topics on 360k full-length media articles. Then, for text generation, we propose a new algorithm called Text Annealing Depolarization Algorithm (TADA). TADA retrieves neutral expressions from the word embedding model that not only decrease ideological polarity but also preserve the original argument of the text, while maintaining grammatical correctness. We evaluate our framework by comparing the depolarized output of our model in two modes, fully-automatic and semi-automatic, on 99 stories spanning 11 topics. Based on feedback from 161 human testers, our framework successfully depolarized 90.1% of paragraphs in semi-automatic mode and 78.3% of paragraphs in fully-automatic mode. Furthermore, 81.2% of the testers agree that the non-polar content information is well-preserved and 79% agree that depolarization does not harm semantic correctness when they compare the original text and the depolarized text. Our work shows that data-driven methods can help to locate political polarity and aid in the depolarization of articles.

قيم البحث

393 - Kailash Karthik Saravanakumar , Miguel Ballesteros , Muthu Kumarn Chandrasekaran 2021

We propose a method for online news stream clustering that is a variant of the non-parametric streaming K-means algorithm. Our model uses a combination of sparse and dense document representations, aggregates document-cluster similarity along these m ultiple representations and makes the clustering decision using a neural classifier. The weighted document-cluster similarity model is learned using a novel adaptation of the triplet loss into a linear classification objective. We show that the use of a suitable fine-tuning objective and external knowledge in pre-trained transformer models yields significant improvements in the effectiveness of contextual embeddings for clustering. Our model achieves a new state-of-the-art on a standard stream clustering dataset of English documents.

الحساب واللغة الذكاء الاصطناعي استرجاع المعلومات

Analyzing Political Bias and Unfairness in News Articles at Different Levels of Granularity

78 - Wei-Fan Chen , Khalid Al-Khatib , Henning Wachsmuth 2020

Media organizations bear great reponsibility because of their considerable influence on shaping beliefs and positions of our society. Any form of media can contain overly biased content, e.g., by reporting on political events in a selective or incomp lete manner. A relevant question hence is whether and how such form of imbalanced news coverage can be exposed. The research presented in this paper addresses not only the automatic detection of bias but goes one step further in that it explores how political bias and unfairness are manifested linguistically. In this regard we utilize a new corpus of 6964 news articles with labels derived from adfontesmedia.com and develop a neural model for bias assessment. By analyzing this model on article excerpts, we find insightful bias patterns at different levels of text granularity, from single words to the whole article discourse.

الحساب واللغة

Interpretable Propaganda Detection in News Articles

226 - Seunghak Yu , Giovanni Da San Martino , Mitra Mohtarami 2021

Online users today are exposed to misleading and propagandistic news articles and media posts on a daily basis. To counter thus, a number of approaches have been designed aiming to achieve a healthier and safer online news and media consumption. Auto matic systems are able to support humans in detecting such content; yet, a major impediment to their broad adoption is that besides being accurate, the decisions of such systems need also to be interpretable in order to be trusted and widely adopted by users. Since misleading and propagandistic content influences readers through the use of a number of deception techniques, we propose to detect and to show the use of such techniques as a way to offer interpretability. In particular, we define qualitatively descriptive features and we analyze their suitability for detecting deception techniques. We further show that our interpretable features can be easily combined with pre-trained language models, yielding state-of-the-art results.

الحساب واللغة الذكاء الاصطناعي التعلم الآلي

Graph-based Topic Extraction from Vector Embeddings of Text Documents: Application to a Corpus of News Articles

78 - M. Tarik Altuncu , Sophia N. Yaliraki , Mauricio Barahona 2020

Production of news content is growing at an astonishing rate. To help manage and monitor the sheer amount of text, there is an increasing need to develop efficient methods that can provide insights into emerging content areas, and stratify unstructur ed corpora of text into `topics that stem intrinsically from content similarity. Here we present an unsupervised framework that brings together powerful vector embeddings from natural language processing with tools from multiscale graph partitioning that can reveal natural partitions at different resolutions without making a priori assumptions about the number of clusters in the corpus. We show the advantages of graph-based clustering through end-to-end comparisons with other popular clustering and topic modelling methods, and also evaluate different text vector embeddings, from classic Bag-of-Words to Doc2Vec to the recent transformers based model Bert. This comparative work is showcased through an analysis of a corpus of US news coverage during the presidential election year of 2016.

الحساب واللغة الذكاء الاصطناعي التعلم الآلي

Learning Multilingual Word Embeddings Using Image-Text Data

87 - Karan Singhal , Karthik Raman , Balder ten Cate 2019

There has been significant interest recently in learning multilingual word embeddings -- in which semantically similar words across languages have similar embeddings. State-of-the-art approaches have relied on expensive labeled data, which is unavail able for low-resource languages, or have involved post-hoc unification of monolingual embeddings. In the present paper, we investigate the efficacy of multilingual embeddings learned from weakly-supervised image-text data. In particular, we propose methods for learning multilingual embeddings using image-text data, by enforcing similarity between the representations of the image and that of the text. Our experiments reveal that even without using any expensive labeled data, a bag-of-words-based embedding model trained on image-text data achieves performance comparable to the state-of-the-art on crosslingual semantic similarity tasks.

الحساب واللغة الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط