Selecting Data to Clean for Fact Checking: Minimizing Uncertainty vs. Maximizing Surprise

50 0 0.0 ( 0 )

Download Cite

Added by Stavros Sintos

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Stavros Sintos - Pankaj K. Agarwal - Jun Yang

Databases

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We study the optimization problem of selecting numerical quantities to clean in order to fact-check claims based on such data. Oftentimes, such claims are technically correct, but they can still mislead for two reasons. First, data may contain uncertainty and errors. Second, data can be fished to advance particular positions. In practice, fact-checkers cannot afford to clean all data and must choose to clean what matters the most to checking a claim. We explore alternative definitions of what matters the most: one is to ascertain claim qualities (by minimizing uncertainty in these measures), while an alternative is just to counter the claim (by maximizing the probability of finding a counterargument). We show whether the two objectives align with each other, with important implications on when fact-checkers should exercise care in selective data cleaning, to avoid potential bias introduced by their desire to counter claims. We develop efficient algorithms for solving the various variants of the optimization problem, showing significant improvements over naive solutions. The problem is particularly challenging because the objectives in the fact-checking context are complex, non-linear functions over data. We obtain results that generalize to a large class of functions, with potential applications beyond fact-checking.

rate research

BONUS! Maximizing Surprise

119 - Zhihuan Huang , Yuqing Kong , Tracy Xiao Liu 2021

Multi-round competitions often double or triple the points awarded in the final round, calling it a bonus, to maximize spectators excitement. In a two-player competition with $n$ rounds, we aim to derive the optimal bonus size to maximize the audiences overall expected surprise (as defined in [7]). We model the audiences prior belief over the two players ability levels as a beta distribution. Using a novel analysis that clarifies and simplifies the computation, we find that the optimal bonus depends greatly upon the prior belief and obtain solutions of various forms for both the case of a finite number of rounds and the asymptotic case. In an interesting special case, we show that the optimal bonus approximately and asymptotically equals to the expected lead, the number of points the weaker player will need to come back in expectation. Moreover, we observe that priors with a higher skewness lead to a higher optimal bonus size, and in the symmetric case, priors with a higher uncertainty also lead to a higher optimal bonus size. This matches our intuition since a highly asymmetric prior leads to a high expected lead, and a highly uncertain symmetric prior often leads to a lopsided game, which again benefits from a larger bonus.

Computer Science and Game Theory

Generating Fact Checking Summaries for Web Claims

93 - Rahul Mishra , Dhruv Gupta , Markus Leippold 2020

We present SUMO, a neural attention-based approach that learns to establish the correctness of textual claims based on evidence in the form of text documents (e.g., news articles or Web documents). SUMO further generates an extractive summary by presenting a diversified set of sentences from the documents that explain its decision on the correctness of the textual claim. Prior approaches to address the problem of fact checking and evidence extraction have relied on simple concatenation of claim and document word embeddings as an input to claim driven attention weight computation. This is done so as to extract salient words and sentences from the documents that help establish the correctness of the claim. However, this design of claim-driven attention does not capture the contextual information in documents properly. We improve on the prior art by using improved claim and title guided hierarchical attention to model effective contextual cues. We show the efficacy of our approach on datasets concerning political, healthcare, and environmental issues.

Computation and Language

Claim Matching Beyond English to Scale Global Fact-Checking

70 - Ashkan Kazemi , Kiran Garimella , Devin Gaffney 2021

Manual fact-checking does not scale well to serve the needs of the internet. This issue is further compounded in non-English contexts. In this paper, we discuss claim matching as a possible solution to scale fact-checking. We define claim matching as the task of identifying pairs of textual messages containing claims that can be served with one fact-check. We construct a novel dataset of WhatsApp tipline and public group messages alongside fact-checked claims that are first annotated for containing claim-like statements and then matched with potentially similar items and annotated for claim matching. Our dataset contains content in high-resource (English, Hindi) and lower-resource (Bengali, Malayalam, Tamil) languages. We train our own embedding model using knowledge distillation and a high-quality teacher model in order to address the imbalance in embedding quality between the low- and high-resource languages in our dataset. We provide evaluations on the performance of our solution and compare with baselines and existing state-of-the-art multilingual embedding models, namely LASER and LaBSE. We demonstrate that our performance exceeds LASER and LaBSE in all settings. We release our annotated datasets, codebooks, and trained embedding model to allow for further research.

Computation and Language

Computational fact checking from knowledge networks

223 - Giovanni Luca Ciampaglia , Prashant Shiralkar , Luis M. Rocha 2015

Traditional fact checking by expert journalists cannot keep up with the enormous volume of information that is now generated online. Computational fact checking may significantly enhance our ability to evaluate the veracity of dubious information. Here we show that the complexities of human fact checking can be approximated quite well by finding the shortest path between concept nodes under properly defined semantic proximity metrics on knowledge graphs. Framed as a network problem this approach is feasible with efficient computational techniques. We evaluate this approach by examining tens of thousands of claims related to history, entertainment, geography, and biographical information using a public knowledge graph extracted from Wikipedia. Statements independently known to be true consistently receive higher support via our method than do false ones. These findings represent a significant step toward scalable computational fact-checking methods that may one day mitigate the spread of harmful misinformation.

Computers and Society Social and Information Networks Physics and Society

WhatTheWikiFact: Fact-Checking Claims Against Wikipedia

127 - Anton Chernyavskiy , Dmitry Ilvovsky , Preslav Nakov 2021

The rise of Internet has made it a major source of information. Unfortunately, not all information online is true, and thus a number of fact-checking initiatives have been launched, both manual and automatic. Here, we present our contribution in this regard: WhatTheWikiFact, a system for automatic claim verification using Wikipedia. The system predicts the veracity of an input claim, and it further shows the evidence it has retrieved as part of the verification process. It shows confidence scores and a list of relevant Wikipedia articles, together with detailed information about each article, including the phrase used to retrieve it, the most relevant sentences it contains, and their stances with respect to the input claim, with associated probabilities.

Computation and Language Information Retrieval