ﻻ يوجد ملخص باللغة العربية
We consider the problem of using observational data to estimate the causal effects of linguistic properties. For example, does writing a complaint politely lead to a faster response time? How much will a positive product review increase sales? This paper addresses two technical challenges related to the problem before developing a practical method. First, we formalize the causal quantity of interest as the effect of a writers intent, and establish the assumptions necessary to identify this from observational data. Second, in practice, we only have access to noisy proxies for the linguistic properties of interest -- e.g., predictions from classifiers and lexicons. We propose an estimator for this setting and prove that its bias is bounded when we perform an adjustment for the text. Based on these results, we introduce TextCause, an algorithm for estimating causal effects of linguistic properties. The method leverages (1) distant supervision to improve the quality of noisy proxies, and (2) a pre-trained language model (BERT) to adjust for the text. We show that the proposed method outperforms related approaches when estimating the effect of Amazon review sentiment on semi-simulated sales figures. Finally, we present an applied case study investigating the effects of complaint politeness on bureaucratic response times.
Word embeddings have become a staple of several natural language processing tasks, yet much remains to be understood about their properties. In this work, we analyze word embeddings in terms of their principal components and arrive at a number of nov
This paper concerns the intersection of natural language and the physical space around us in which we live, that we observe and/or imagine things within. Many important features of language have spatial connotations, for example, many prepositions (l
In this work, we propose a computational framework in which agents equipped with communication capabilities simultaneously play a series of referential games, where agents are trained using deep reinforcement learning. We demonstrate that the framewo
Statistical methods applied to social media posts shed light on the dynamics of online dialogue. For example, users wording choices predict their persuasiveness and users adopt the language patterns of other dialogue participants. In this paper, we e
In this paper, we present CogNet, a knowledge base (KB) dedicated to integrating three types of knowledge: (1) linguistic knowledge from FrameNet, which schematically describes situations, objects and events. (2) world knowledge from YAGO, Freebase,