Do you want to publish a course? Click here

A Balanced and Broadly Targeted Computational Linguistics Curriculum

مناهج اللغويات الحسابية المتوازنة والموجهة على نطاق واسع

139   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

This paper describes the primarily-graduate computational linguistics and NLP curriculum at Georgetown University, a U.S. university that has seen significant growth in these areas in recent years. We reflect on the principles behind our curriculum choices, including recognizing the various academic backgrounds and goals of our students; teaching a variety of skills with an emphasis on working directly with data; encouraging collaboration and interdisciplinary work; and including languages beyond English. We reflect on challenges we have encountered, such as the difficulty of teaching programming skills alongside NLP fundamentals, and discuss areas for future growth.

References used
https://aclanthology.org/
rate research

Read More

The Electromagnetic Interference and EMC are one important phenomenon, since the EMI causes degradation in the performance of electric and electronic instruments. The EMI- problem may decrease effectiveness of sensitive devices and even may lead to a failure of its operation. This paper studies EMI-problem between different systems by using convenient computer programs as CST, which provides modulation and simulation of this problem. This method provide ability to trace and evaluate EMI microscopically in space and real time.
NLP's sphere of influence went much beyond computer science research and the development of software applications in the past decade. We see people using NLP methods in a range of academic disciplines from Asian Studies to Clinical Oncology. We also notice the presence of NLP as a module in most of the data science curricula within and outside of regular university setups. These courses are taken by students from very diverse backgrounds. This paper takes a closer look at some issues related to teaching NLP to these diverse audiences based on my classroom experiences, and identifies some challenges the instructors face, particularly when there is no ecosystem of related courses for the students. In this process, it also identifies a few challenge areas for both NLP researchers and tool developers.
Finding the year of writing for a historical text is of crucial importance to historical research. However, the year of original creation is rarely explicitly stated and must be inferred from the text content, historical records, and codicological cl ues. Given a transcribed text, machine learning has successfully been used to estimate the year of production. In this paper, we present an overview of several estimation approaches for historical text archives spanning from the 12th century until today.
Currently, multilingual machine translation is receiving more and more attention since it brings better performance for low resource languages (LRLs) and saves more space. However, existing multilingual machine translation models face a severe challe nge: imbalance. As a result, the translation performance of different languages in multilingual translation models are quite different. We argue that this imbalance problem stems from the different learning competencies of different languages. Therefore, we focus on balancing the learning competencies of different languages and propose Competence-based Curriculum Learning for Multilingual Machine Translation, named CCL-M. Specifically, we firstly define two competencies to help schedule the high resource languages (HRLs) and the low resource languages: 1) Self-evaluated Competence, evaluating how well the language itself has been learned; and 2) HRLs-evaluated Competence, evaluating whether an LRL is ready to be learned according to HRLs' Self-evaluated Competence. Based on the above competencies, we utilize the proposed CCL-M algorithm to gradually add new languages into the training set in a curriculum learning manner. Furthermore, we propose a novel competence-aware dynamic balancing sampling strategy for better selecting training samples in multilingual training. Experimental results show that our approach has achieved a steady and significant performance gain compared to the previous state-of-the-art approach on the TED talks dataset.
Recent works have found evidence of gender bias in models of machine translation and coreference resolution using mostly synthetic diagnostic datasets. While these quantify bias in a controlled experiment, they often do so on a small scale and consis t mostly of artificial, out-of-distribution sentences. In this work, we find grammatical patterns indicating stereotypical and non-stereotypical gender-role assignments (e.g., female nurses versus male dancers) in corpora from three domains, resulting in a first large-scale gender bias dataset of 108K diverse real-world English sentences. We manually verify the quality of our corpus and use it to evaluate gender bias in various coreference resolution and machine translation models. We find that all tested models tend to over-rely on gender stereotypes when presented with natural inputs, which may be especially harmful when deployed in commercial systems. Finally, we show that our dataset lends itself to finetuning a coreference resolution model, finding it mitigates bias on a held out set. Our dataset and models are publicly available at github.com/SLAB-NLP/BUG. We hope they will spur future research into gender bias evaluation mitigation techniques in realistic settings.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا