Do you want to publish a course? Click here

Impact of COVID-19 in Natural Language Processing Publications: a Disaggregated Study in Gender, Contribution and Experience

تأثير Covid-19 في منشورات معالجة اللغة الطبيعية: دراسة مصنفة في النوع الاجتماعي والمساهمة والخبرات

56   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

This study sheds light on the effects of COVID-19 in the particular field of Computational Linguistics and Natural Language Processing within Artificial Intelligence. We provide an inter-sectional study on gender, contribution, and experience that considers one school year (from August 2019 to August 2020) as a pandemic year. August is included twice for the purpose of an inter-annual comparison. While the trend in publications increased with the crisis, the results show that the ratio between female and male publications decreased. This only helps to reduce the importance of the female role in the scientific contributions of computational linguistics (it is now far below its peak of 0.24). The pandemic has a particularly negative effect on the production of female senior researchers in the first position of authors (maximum work), followed by the female junior researchers in the last position of authors (supervision or collaborative work).

References used
https://aclanthology.org/
rate research

Read More

Natural Language Processing (NLP) systems are at the heart of many critical automated decision-making systems making crucial recommendations about our future world. Gender bias in NLP has been well studied in English, but has been less studied in oth er languages. In this paper, a team including speakers of 9 languages - Chinese, Spanish, English, Arabic, German, French, Farsi, Urdu, and Wolof - reports and analyzes measurements of gender bias in the Wikipedia corpora for these 9 languages. We develop extensions to profession-level and corpus-level gender bias metric calculations originally designed for English and apply them to 8 other languages, including languages that have grammatically gendered nouns including different feminine, masculine, and neuter profession words. We discuss future work that would benefit immensely from a computational linguistics perspective.
Knowledge Distillation (KD) offers a natural way to reduce the latency and memory/energy usage of massive pretrained models that have come to dominate Natural Language Processing (NLP) in recent years. While numerous sophisticated variants of KD algo rithms have been proposed for NLP applications, the key factors underpinning the optimal distillation performance are often confounded and remain unclear. We aim to identify how different components in the KD pipeline affect the resulting performance and how much the optimal KD pipeline varies across different datasets/tasks, such as the data augmentation policy, the loss function, and the intermediate representation for transferring the knowledge between teacher and student. To tease apart their effects, we propose Distiller, a meta KD framework that systematically combines a broad range of techniques across different stages of the KD pipeline, which enables us to quantify each component's contribution. Within Distiller, we unify commonly used objectives for distillation of intermediate representations under a universal mutual information (MI) objective and propose a class of MI-objective functions with better bias/variance trade-off for estimating the MI between the teacher and the student. On a diverse set of NLP datasets, the best Distiller configurations are identified via large-scale hyper-parameter optimization. Our experiments reveal the following: 1) the approach used to distill the intermediate representations is the most important factor in KD performance, 2) among different objectives for intermediate distillation, MI-performs the best, and 3) data augmentation provides a large boost for small training datasets or small student networks. Moreover, we find that different datasets/tasks prefer different KD algorithms, and thus propose a simple AutoDistiller algorithm that can recommend a good KD pipeline for a new dataset.
Recent studies show that many NLP systems are sensitive and vulnerable to a small perturbation of inputs and do not generalize well across different datasets. This lack of robustness derails the use of NLP systems in real-world applications. This tut orial aims at bringing awareness of practical concerns about NLP robustness. It targets NLP researchers and practitioners who are interested in building reliable NLP systems. In particular, we will review recent studies on analyzing the weakness of NLP systems when facing adversarial inputs and data with a distribution shift. We will provide the audience with a holistic view of 1) how to use adversarial examples to examine the weakness of NLP models and facilitate debugging; 2) how to enhance the robustness of existing NLP models and defense against adversarial inputs; and 3) how the consideration of robustness affects the real-world NLP applications used in our daily lives. We will conclude the tutorial by outlining future research directions in this area.
In this paper, we propose a definition and taxonomy of various types of non-standard textual content -- generally referred to as noise'' -- in Natural Language Processing (NLP). While data pre-processing is undoubtedly important in NLP, especially wh en dealing with user-generated content, a broader understanding of different sources of noise and how to deal with them is an aspect that has been largely neglected. We provide a comprehensive list of potential sources of noise, categorise and describe them, and show the impact of a subset of standard pre-processing strategies on different tasks. Our main goal is to raise awareness of non-standard content -- which should not always be considered as noise'' -- and of the need for careful, task-dependent pre-processing. This is an alternative to blanket, all-encompassing solutions generally applied by researchers through standard'' pre-processing pipelines. The intention is for this categorisation to serve as a point of reference to support NLP researchers in devising strategies to clean, normalise or embrace non-standard content.
In online forums focused on health and wellbeing, individuals tend to seek and give the following social support: emotional and informational support. Understanding the expressions of these social supports in an online COVID- 19 forum is important fo r: (a) the forum and its members to provide the right type of support to individuals and (b) determining the long term effects of the COVID-19 pandemic on the well-being of the public, thereby informing interventions. In this work, we build four machine learning models to measure the extent of the following social supports expressed in each post in a COVID-19 online forum: (a) emotional support given (b) emotional support sought (c) informational support given, and (d) informational support sought. Using these models, we aim to: (i) determine if there is a correlation between the different social supports expressed in posts e.g. when members of the forum give emotional support in posts, do they also tend to give or seek informational support in the same post? (ii) determine how these social supports sought and given changes over time in published posts. We find that (i) there is a positive correlation between the informational support given in posts and the emotional support given and emotional support sought, respectively, in these posts and (ii) over time, users tended to seek more emotional support and give less emotional support.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا