No Arabic abstract
In this work, we define the task of teaser generation and provide an evaluation benchmark and baseline systems for the process of generating teasers. A teaser is a short reading suggestion for an article that is illustrative and includes curiosity-arousing elements to entice potential readers to read particular news items. Teasers are one of the main vehicles for transmitting news to social media users. We compile a novel dataset of teasers by systematically accumulating tweets and selecting those that conform to the teaser definition. We have compared a number of neural abstractive architectures on the task of teaser generation and the overall best performing system is See et al.(2017)s seq2seq with pointer network.
A popular multimedia news format nowadays is providing users with a lively video and a corresponding news article, which is employed by influential news media including CNN, BBC, and social media including Twitter and Weibo. In such a case, automatically choosing a proper cover frame of the video and generating an appropriate textual summary of the article can help editors save time, and readers make the decision more effectively. Hence, in this paper, we propose the task of Video-based Multimodal Summarization with Multimodal Output (VMSMO) to tackle such a problem. The main challenge in this task is to jointly model the temporal dependency of video with semantic meaning of article. To this end, we propose a Dual-Interaction-based Multimodal Summarizer (DIMS), consisting of a dual interaction module and multimodal generator. In the dual interaction module, we propose a conditional self-attention mechanism that captures local semantic information within video and a global-attention mechanism that handles the semantic relationship between news text and video from a high level. Extensive experiments conducted on a large-scale real-world VMSMO dataset show that DIMS achieves the state-of-the-art performance in terms of both automatic metrics and human evaluations.
Despite agreement on the importance of detecting out-of-distribution (OOD) examples, there is little consensus on the formal definition of OOD examples and how to best detect them. We categorize these examples by whether they exhibit a background shift or a semantic shift, and find that the two major approaches to OOD detection, model calibration and density estimation (language modeling for text), have distinct behavior on these types of OOD data. Across 14 pairs of in-distribution and OOD English natural language understanding datasets, we find that density estimation methods consistently beat calibration methods in background shift settings, while performing worse in semantic shift settings. In addition, we find that both methods generally fail to detect examples from challenge data, highlighting a weak spot for current methods. Since no single method works well across all settings, our results call for an explicit definition of OOD examples when evaluating different detection methods.
Over the past three years it has become evident that fake news is a danger to democracy. However, until now there has been no clear understanding of how to define fake news, much less how to model it. This paper addresses both these issues. A definition of fake news is given, and two approaches for the modelling of fake news and its impact in elections and referendums are introduced. The first approach, based on the idea of a representative voter, is shown to be suitable to obtain a qualitative understanding of phenomena associated with fake news at a macroscopic level. The second approach, based on the idea of an election microstructure, describes the collective behaviour of the electorate by modelling the preferences of individual voters. It is shown through a simulation study that the mere knowledge that pieces of fake news may be in circulation goes a long way towards mitigating the impact of fake news.
Proverbs are an essential component of language and culture, and though much attention has been paid to their history and currency, there has been comparatively little quantitative work on changes in the frequency with which they are used over time. With wider availability of large corpora reflecting many diverse genres of documents, it is now possible to take a broad and dynamic view of the importance of the proverb. Here, we measure temporal changes in the relevance of proverbs within three corpora, differing in kind, scale, and time frame: Millions of books over centuries; hundreds of millions of news articles over twenty years; and billions of tweets over a decade. We find that proverbs present heavy-tailed frequency-of-usage rank distributions in each venue; exhibit trends reflecting the cultural dynamics of the eras covered; and have evolved into contemporary forms on social media.
The habitable zone (HZ) is the region around a star(s) where standing bodies of water could exist on the surface of a rocky planet. The classical HZ definition makes a number of assumptions common to the Earth, including assuming that the most important greenhouse gases for habitable planets are CO2 and H2O, habitable planets orbit main-sequence stars, and that the carbonate-silicate cycle is a universal process on potentially habitable planets. Here, we discuss these and other predictions for the habitable zone and the observations that are needed to test them. We also, for the first time, argue why A-stars may be interesting HZ prospects. Instead of relying on unverified extrapolations from our Earth, we argue that future habitability studies require first principles approaches where temporal, spatial, physical, chemical, and biological systems are dynamically coupled. We also suggest that next-generation missions are only the beginning of a much more data-filled era in the not-too-distant future, when possibly hundreds to thousands of HZ planets will yield the statistical data we need to go beyond just finding habitable zone planets to actually determining which ones are most likely to exhibit life.