ﻻ يوجد ملخص باللغة العربية
Transformer models have shown impressive performance on a variety of NLP tasks. Off-the-shelf, pre-trained models can be fine-tuned for specific NLP classification tasks, reducing the need for large amounts of additional training data. However, little research has addressed how much data is required to accurately fine-tune such pre-trained transformer models, and how much data is needed for accurate prediction. This paper explores the usability of BERT (a Transformer model for word embedding) for gender prediction on social media. Forensic applications include detecting gender obfuscation, e.g. males posing as females in chat rooms. A Dutch BERT model is fine-tuned on different samples of a Dutch Twitter dataset labeled for gender, varying in the number of tweets used per person. The results show that finetuning BERT contributes to good gender classification performance (80% F1) when finetuned on only 200 tweets per person. But when using just 20 tweets per person, the performance of our classifier deteriorates non-steeply (to 70% F1). These results show that even with relatively small amounts of data, BERT can be fine-tuned to accurately help predict the gender of Twitter users, and, consequently, that it is possible to determine gender on the basis of just a low volume of tweets. This opens up an operational perspective on the swift detection of gender.
The goal of Author Profiling (AP) is to identify demographic aspects (e.g., age, gender) from a given set of authors by analyzing their written texts. Recently, the AP task has gained interest in many problems related to computer forensics, psycholog
In this study, we proposed a convolutional neural network model for gender prediction using English Twitter text as input. Ensemble of proposed model achieved an accuracy at 0.8237 on gender prediction and compared favorably with the state-of-the-art
Dialogue systems play an increasingly important role in various aspects of our daily life. It is evident from recent research that dialogue systems trained on human conversation data are biased. In particular, they can produce responses that reflect
User profiling means exploiting the technology of machine learning to predict attributes of users, such as demographic attributes, hobby attributes, preference attributes, etc. Its a powerful data support of precision marketing. Existing methods main
We present our system for the CLIN29 shared task on cross-genre gender detection for Dutch. We experimented with a multitude of neural models (CNN, RNN, LSTM, etc.), more traditional models (SVM, RF, LogReg, etc.), different feature sets as well as d