No Arabic abstract
COVID-19, as a global health crisis, has triggered the fear emotion with unprecedented intensity. Besides the fear of getting infected, the outbreak of COVID-19 also created significant disruptions in peoples daily life and thus evoked intensive psychological responses indirect to COVID-19 infections. Here, we construct an expressed fear database using 16 million social media posts generated by 536 thousand users between January 1st, 2019 and August 31st, 2020 in China. We employ deep learning techniques to detect the fear emotion within each post and apply topic models to extract the central fear topics. Based on this database, we find that sleep disorders (nightmare and insomnia) take up the largest share of fear-labeled posts in the pre-pandemic period (January 2019-December 2019), and significantly increase during the COVID-19. We identify health and work-related concerns are the two major sources of fear induced by the COVID-19. We also detect gender differences, with females generating more posts containing the daily-life fear sources during the COVID-19 period. This research adopts a data-driven approach to trace back public emotion, which can be used to complement traditional surveys to achieve real-time emotion monitoring to discern societal concerns and support policy decision-making.
Following the onset of the COVID-19 pandemic and subsequent lockdowns, software engineers daily life was disrupted and abruptly forced into remote working from home. This change deeply impacted typical working routines, affecting both well-being and productivity. Moreover, this pandemic will have long-lasting effects in the software industry, with several tech companies allowing their employees to work from home indefinitely if they wish to do so. Therefore, it is crucial to analyze and understand how a typical working day looks like when working from home and how individual activities affect software developers well-being and productivity. We performed a two-wave longitudinal study involving almost 200 globally carefully selected software professionals, inferring daily activities with perceived well-being, productivity, and other relevant psychological and social variables. Results suggest that the time software engineers spent doing specific activities from home was similar when working in the office. However, we also found some significant mean differences. The amount of time developers spent on each activity was unrelated to their well-being, perceived productivity, and other variables. We conclude that working remotely is not per se a challenge for organizations or developers.
Computational measures of linguistic diversity help us understand the linguistic landscape using digital language data. The contribution of this paper is to calibrate measures of linguistic diversity using restrictions on international travel resulting from the COVID-19 pandemic. Previous work has mapped the distribution of languages using geo-referenced social media and web data. The goal, however, has been to describe these corpora themselves rather than to make inferences about underlying populations. This paper shows that a difference-in-differences method based on the Herfindahl-Hirschman Index can identify the bias in digital corpora that is introduced by non-local populations. These methods tell us where significant changes have taken place and whether this leads to increased or decreased diversity. This is an important step in aligning digital corpora like social media with the real-world populations that have produced them.
This paper elaborates on the notion of uncertainty in the context of annotation in large text corpora, specifically focusing on (but not limited to) historical languages. Such uncertainty might be due to inherent properties of the language, for example, linguistic ambiguity and overlapping categories of linguistic description, but could also be caused by lacking annotation expertise. By examining annotation uncertainty in more detail, we identify the sources and deepen our understanding of the nature and different types of uncertainty encountered in daily annotation practice. Moreover, some practical implications of our theoretical findings are also discussed. Last but not least, this article can be seen as an attempt to reconcile the perspectives of the main scientific disciplines involved in corpus projects, linguistics and computer science, to develop a unified view and to highlight the potential synergies between these disciplines.
The need to forecast COVID-19 related variables continues to be pressing as the epidemic unfolds. Different efforts have been made, with compartmental models in epidemiology and statistical models such as AutoRegressive Integrated Moving Average (ARIMA), Exponential Smoothing (ETS) or computing intelligence models. These efforts have proved useful in some instances by allowing decision makers to distinguish different scenarios during the emergency, but their accuracy has been disappointing, forecasts ignore uncertainties and less attention is given to local areas. In this study, we propose a simple Multiple Linear Regression model, optimised to use call data to forecast the number of daily confirmed cases. Moreover, we produce a probabilistic forecast that allows decision makers to better deal with risk. Our proposed approach outperforms ARIMA, ETS and a regression model without call data, evaluated by three point forecast error metrics, one prediction interval and two probabilistic forecast accuracy measures. The simplicity, interpretability and reliability of the model, obtained in a careful forecasting exercise, is a meaningful contribution to decision makers at local level who acutely need to organise resources in already strained health services. We hope that this model would serve as a building block of other forecasting efforts that on the one hand would help front-line personal and decision makers at local level, and on the other would facilitate the communication with other modelling efforts being made at the national level to improve the way we tackle this pandemic and other similar future challenges.
The INTERSPEECH 2021 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the COVID-19 Cough and COVID-19 Speech Sub-Challenges, a binary classification on COVID-19 infection has to be made based on coughing sounds and speech; in the Escalation SubChallenge, a three-way assessment of the level of escalation in a dialogue is featured; and in the Primates Sub-Challenge, four species vs background need to be classified. We describe the Sub-Challenges, baseline feature extraction, and classifiers based on the usual COMPARE and BoAW features as well as deep unsupervised representation learning using the AuDeep toolkit, and deep feature extraction from pre-trained CNNs using the Deep Spectrum toolkit; in addition, we add deep end-to-end sequential modelling, and partially linguistic analysis.