No Arabic abstract
The COVID-19 Open Research Dataset (CORD-19) is a growing resource of scientific papers on COVID-19 and related historical coronavirus research. CORD-19 is designed to facilitate the development of text mining and information retrieval systems over its rich collection of metadata and structured full text papers. Since its release, CORD-19 has been downloaded over 200K times and has served as the basis of many COVID-19 text mining and discovery systems. In this article, we describe the mechanics of dataset construction, highlighting challenges and key design decisions, provide an overview of how CORD-19 has been used, and describe several shared tasks built around the dataset. We hope this resource will continue to bring together the computing community, biomedical experts, and policy makers in the search for effective treatments and management policies for COVID-19.
We present Covidex, a search engine that exploits the latest neural ranking models to provide information access to the COVID-19 Open Research Dataset curated by the Allen Institute for AI. Our system has been online and serving users since late March 2020. The Covidex is the user application component of our three-pronged strategy to develop technologies for helping domain experts tackle the ongoing global pandemic. In addition, we provide robust and easy-to-use keyword search infrastructure that exploits mature fusion-based methods as well as standalone neural ranking models that can be incorporated into other applications. These techniques have been evaluated in the ongoing TREC-COVID challenge: Our infrastructure and baselines have been adopted by many participants, including some of the highest-scoring runs in rounds 1, 2, and 3. In round 3, we report the highest-scoring run that takes advantage of previous training data and the second-highest fully automatic run.
We study the disproportionate impact of the lockdown as a result of the COVID-19 outbreak on female and male academics research productivity in social science. The lockdown has caused substantial disruptions to academic activities, requiring people to work from home. How this disruption affects productivity and the related gender equity is an important operations and societal question. We collect data from the largest open-access preprint repository for social science on 41,858 research preprints in 18 disciplines produced by 76,832 authors across 25 countries over a span of two years. We use a difference-in-differences approach leveraging the exogenous pandemic shock. Our results indicate that, in the 10 weeks after the lockdown in the United States, although the total research productivity increased by 35%, female academics productivity dropped by 13.9% relative to that of male academics. We also show that several disciplines drive such gender inequality. Finally, we find that this intensified productivity gap is more pronounced for academics in top-ranked universities, and the effect exists in six other countries. Our work points out the fairness issue in productivity caused by the lockdown, a finding that universities will find helpful when evaluating faculty productivity. It also helps organizations realize the potential unintended consequences that can arise from telecommuting.
The world has seen in 2020 an unprecedented global outbreak of SARS-CoV-2, a new strain of coronavirus, causing the COVID-19 pandemic, and radically changing our lives and work conditions. Many scientists are working tirelessly to find a treatment and a possible vaccine. Furthermore, governments, scientific institutions and companies are acting quickly to make resources available, including funds and the opening of large-volume data repositories, to accelerate innovation and discovery aimed at solving this pandemic. In this paper, we develop a novel automated theme-based visualisation method, combining advanced data modelling of large corpora, information mapping and trend analysis, to provide a top-down and bottom-up browsing and search interface for quick discovery of topics and research resources. We apply this method on two recently released publications datasets (Dimensions COVID-19 dataset and the Allen Institute for AIs CORD-19). The results reveal intriguing information including increased efforts in topics such as social distancing; cross-domain initiatives (e.g. mental health and education); evolving research in medical topics; and the unfolding trajectory of the virus in different territories through publications. The results also demonstrate the need to quickly and automatically enable search and browsing of large corpora. We believe our methodology will improve future large volume visualisation and discovery systems but also hope our visualisation interfaces will currently aid scientists, researchers, and the general public to tackle the numerous issues in the fight against the COVID-19 pandemic.
Coronavirus disease (COVID-19) has been declared as a pandemic by WHO with thousands of cases being reported each day. Numerous scientific articles are being published on the disease raising the need for a service which can organize, and query them in a reliable fashion. To support this cause we present AWS CORD-19 Search (ACS), a public, COVID-19 specific, neural search engine that is powered by several machine learning systems to support natural language based searches. ACS with capabilities such as document ranking, passage ranking, question answering and topic classification provides a scalable solution to COVID-19 researchers and policy makers in their search and discovery for answers to high priority scientific questions. We present a quantitative evaluation and qualitative analysis of the system against other leading COVID-19 search platforms. ACS is top performing across these systems yielding quality results which we detail with relevant examples in this work.
The INTERSPEECH 2021 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the COVID-19 Cough and COVID-19 Speech Sub-Challenges, a binary classification on COVID-19 infection has to be made based on coughing sounds and speech; in the Escalation SubChallenge, a three-way assessment of the level of escalation in a dialogue is featured; and in the Primates Sub-Challenge, four species vs background need to be classified. We describe the Sub-Challenges, baseline feature extraction, and classifiers based on the usual COMPARE and BoAW features as well as deep unsupervised representation learning using the AuDeep toolkit, and deep feature extraction from pre-trained CNNs using the Deep Spectrum toolkit; in addition, we add deep end-to-end sequential modelling, and partially linguistic analysis.