New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Developing a Benchmark for Reducing Data Bias in Authorship Attribution

تطوير معيارا للحد من تحيز البيانات في إسناد التأليف

313 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Authorship attribution is the task of assigning an unknown document to an author from a set of candidates. In the past, studies in this field use various evaluation datasets to demonstrate the effectiveness of preprocessing steps, features, and models. However, only a small fraction of works use more than one dataset to prove claims. In this paper, we present a collection of highly diverse authorship attribution datasets, which better generalizes evaluation results from authorship attribution research. Furthermore, we implement a wide variety of previously used machine learning models and show that many approaches show vastly different performances when applied to different datasets. We include pre-trained language models, for the first time testing them in this field in a systematic way. Finally, we propose a set of aggregated scores to evaluate different aspects of the dataset collection.

References used

https://aclanthology.org/

rate research

A Call for Clarity in Contemporary Authorship Attribution Evaluation

565 - Association for Computation Linguistics 2021 مقالة

Recent research has documented that results reported in frequently-cited authorship attribution papers are difficult to reproduce. Inaccessible code and data are often proposed as factors which block successful reproductions. Even when original mater ials are available, problems remain which prevent researchers from comparing the effectiveness of different methods. To solve the remaining problems---the lack of fixed test sets and the use of inappropriately homogeneous corpora---our paper contributes materials for five closed-set authorship identification experiments. The five experiments feature texts from 106 distinct authors. Experiments involve a range of contemporary non-fiction American English prose. These experiments provide the foundation for comparable and reproducible authorship attribution research involving contemporary writing.

call for clarity authorship attribution evaluation attribution evaluation دعوة للوضوح تقييم إسناد التأليف تقييم الإسناد صناعة حمض الفوسفور المزيد..

Small-Scale Cross-Language Authorship Attribution on Social Media Comments

359 - Association for Computation Linguistics 2021 مقالة

Cross-language authorship attribution is the challenging task of classifying documents by bilingual authors where the training documents are written in a different language than the evaluation documents. Traditional solutions rely on either translati on to enable the use of single-language features, or language-independent feature extraction methods. More recently, transformer-based language models like BERT can also be pre-trained on multiple languages, making them intuitive candidates for cross-language classifiers which have not been used for this task yet. We perform extensive experiments to benchmark the performance of three different approaches to a smallscale cross-language authorship attribution experiment: (1) using language-independent features with traditional classification models, (2) using multilingual pre-trained language models, and (3) using machine translation to allow single-language classification. For the language-independent features, we utilize universal syntactic features like part-of-speech tags and dependency graphs, and multilingual BERT as a pre-trained language model. We use a small-scale social media comments dataset, closely reflecting practical scenarios. We show that applying machine translation drastically increases the performance of almost all approaches, and that the syntactic features in combination with the translation step achieve the best overall classification performance. In particular, we demonstrate that pre-trained language models are outperformed by traditional models in small scale authorship attribution problems for every language combination analyzed in this paper.

cross-language authorship attribution authorship attribution social media comments الإسناد المؤلف عبر اللغة إسناد التأليف وسائل التواصل الاجتماعي التعليقات صناعة حمض الفوسفور المزيد..

TIAGE: A Benchmark for Topic-Shift Aware Dialog Modeling

530 - Association for Computation Linguistics 2021 مقالة

Human conversations naturally evolve around different topics and fluently move between them. In research on dialog systems, the ability to actively and smoothly transition to new topics is often ignored. In this paper we introduce TIAGE, a new topic- shift aware dialog benchmark constructed utilizing human annotations on topic shifts. Based on TIAGE, we introduce three tasks to investigate different scenarios of topic-shift modeling in dialog settings: topic-shift detection, topic-shift triggered response generation and topic-aware dialog generation. Experiments on these tasks show that the topic-shift signals in TIAGE are useful for topic-shift response generation. On the other hand, dialog systems still struggle to decide when to change topic. This indicates further research is needed in topic-shift aware dialog modeling.

topic-shift aware dialog aware dialog modeling aware dialog TOPIC-SHIFT علم مربع الحوار نمذجة مربع الحوار إدراك مربع الحوار صناعة حمض الفوسفور المزيد..

Human Rationales as Attribution Priors for Explainable Stance Detection

349 - Association for Computation Linguistics 2021 مقالة

As NLP systems become better at detecting opinions and beliefs from text, it is important to ensure not only that models are accurate but also that they arrive at their predictions in ways that align with human reasoning. In this work, we present a m ethod for imparting human-like rationalization to a stance detection model using crowdsourced annotations on a small fraction of the training data. We show that in a data-scarce setting, our approach can improve the reasoning of a state-of-the-art classifier---particularly for inputs containing challenging phenomena such as sarcasm---at no cost in predictive performance. Furthermore, we demonstrate that attention weights surpass a leading attribution method in providing faithful explanations of our model's predictions, thus serving as a computationally cheap and reliable source of attributions for our model.

explainable stance detection priors for explainable explainable stance اكتشاف موقف قابل للتفسير priors للتفسير موقف قابل للتفسير صناعة حمض الفوسفور المزيد..

Learning Universal Authorship Representations

268 - Association for Computation Linguistics 2021 مقالة

Determining whether two documents were composed by the same author, also known as authorship verification, has traditionally been tackled using statistical methods. Recently, authorship representations learned using neural networks have been found to outperform alternatives, particularly in large-scale settings involving hundreds of thousands of authors. But do such representations learned in a particular domain transfer to other domains? Or are these representations inherently entangled with domain-specific features? To study these questions, we conduct the first large-scale study of cross-domain transfer for authorship verification considering zero-shot transfers involving three disparate domains: Amazon reviews, fanfiction short stories, and Reddit comments. We find that although a surprising degree of transfer is possible between certain domains, it is not so successful between others. We examine properties of these domains that influence generalization and propose simple but effective methods to improve transfer.

learning universal authorship learning universal universal authorship representations تعلم التأليف العالمي تعلم عالمي تمثيلات التأليف الشامل صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Developing a Benchmark for Reducing Data Bias in Authorship Attribution

تطوير معيارا للحد من تحيز البيانات في إسناد التأليف

Ask ChatGPT about the research

Read More

suggested questions