No Arabic abstract
Ethical aspects of research in language technologies have received much attention recently. It is a standard practice to get a study involving human subjects reviewed and approved by a professional ethics committee/board of the institution. How commonly do we see mention of ethical approvals in NLP research? What types of research or aspects of studies are usually subject to such reviews? With the rising concerns and discourse around the ethics of NLP, do we also observe a rise in formal ethical reviews of NLP studies? And, if so, would this imply that there is a heightened awareness of ethical issues that was previously lacking? We aim to address these questions by conducting a detailed quantitative and qualitative analysis of the ACL Anthology, as well as comparing the trends in our field to those of other related disciplines, such as cognitive science, machine learning, data mining, and systems.
Ideas from forensic linguistics are now being used frequently in Natural Language Processing (NLP), using machine learning techniques. While the role of forensic linguistics was more benign earlier, it is now being used for purposes which are questionable. Certain methods from forensic linguistics are employed, without considering their scientific limitations and ethical concerns. While we take the specific case of forensic linguistics as an example of such trends in NLP and machine learning, the issue is a larger one and present in many other scientific and data-driven domains. We suggest that such trends indicate that some of the applied sciences are exceeding their legal and scientific briefs. We highlight how carelessly implemented practices are serving to short-circuit the due processes of law as well breach ethical codes.
Systematic reviews, which entail the extraction of data from large numbers of scientific documents, are an ideal avenue for the application of machine learning. They are vital to many fields of science and philanthropy, but are very time-consuming and require experts. Yet the three main stages of a systematic review are easily done automatically: searching for documents can be done via APIs and scrapers, selection of relevant documents can be done via binary classification, and extraction of data can be done via sequence-labelling classification. Despite the promise of automation for this field, little research exists that examines the various ways to automate each of these tasks. We construct a pipeline that automates each of these aspects, and experiment with many human-time vs. system quality trade-offs. We test the ability of classifiers to work well on small amounts of data and to generalise to data from countries not represented in the training data. We test different types of data extraction with varying difficulty in annotation, and five different neural architectures to do the extraction. We find that we can get surprising accuracy and generalisability of the whole pipeline system with only 2 weeks of human-expert annotation, which is only 15% of the time it takes to do the whole review manually and can be repeated and extended to new data with no additional effort.
Plumitifs (dockets) were initially a tool for law clerks. Nowadays, they are used as summaries presenting all the steps of a judicial case. Information concerning parties identity, jurisdiction in charge of administering the case, and some information relating to the nature and the course of the preceding are available through plumitifs. They are publicly accessible but barely understandable; they are written using abbreviations and referring to provisions from the Criminal Code of Canada, which makes them hard to reason about. In this paper, we propose a simple yet efficient multi-source language generation architecture that leverages both the plumitif and the Criminal Codes content to generate intelligible plumitifs descriptions. It goes without saying that ethical considerations rise with these sensitive documents made readable and available at scale, legitimate concerns that we address in this paper.
By using publications from Web of Science Core Collection (WoSCC), Fosso Wamba and his colleagues published an interesting and comprehensive paper in Technological Forecasting and Social Change to explore the structure and dynamics of artificial intelligence (AI) scholarship. Data demonstrated in Fosso Wambas study implied that the year 1991 seemed to be a watershed of AI research. This research note tried to uncover the 1991 phenomenon from the perspective of database limitation by probing the limitations of search in abstract/author keywords/keywords plus fields of WoSCC empirically. The low availability rates of abstract/author keywords/keywords plus information in WoSCC found in this study can explain the watershed phenomenon of AI scholarship in 1991 to a large extent. Some other caveats for the use of WoSCC in old literature retrieval and historical bibliometric analysis were also mentioned in the discussion section. This research note complements Fosso Wamba and his colleagues study and also helps avoid improper interpretation in the use of WoSCC in old literature retrieval and historical bibliometric analysis.
We recommend that the planetary science and space exploration community engage in a robust reevaluation concerning the ethics of how future crewed and uncrewed missions to the Moon and Mars will interact with those planetary environments. This should occur through a process of community input, with emphasis on how such missions can resist colonial structures. Such discussions must be rooted in the historical context of the violent colonialism in the Americas and across the globe that has accompanied exploration of Earth. The structures created by settler colonialism are very much alive today, impact the scientific community, and are currently replicated in the space exploration communities plans for human exploration and in-situ resource utilization. These discussions must lead to enforceable planetary protection policies that create a framework for ethical exploration of other worlds. Current policy does not adequately address questions related to in-situ resource utilization and environmental preservation and is without enforcement mechanisms. Further, interactions with potential extraterrestrial life have scientific and moral stakes. Decisions on these topics will be made in the coming decade as the Artemis program enables frequent missions to the Moon and crewed missions to Mars. Those first choices will have irreversible consequences for the future of human space exploration and must be extremely well considered, with input from those beyond the scientific community, including expertise from the humanities and members of the general public. Without planetary protection policy that actively resists colonial practices, they will be replicated in our interactions and exploration of other planetary bodies. The time is now to engage in these difficult conversations and disrupt colonial practices within our field so that they are not carried to other worlds.