ﻻ يوجد ملخص باللغة العربية
Learning effective language representations from crowdsourced labels is crucial for many real-world machine learning tasks. A challenging aspect of this problem is that the quality of crowdsourced labels suffer high intra- and inter-observer variability. Since the high-capacity deep neural networks can easily memorize all disagreements among crowdsourced labels, directly applying existing supervised language representation learning algorithms may yield suboptimal solutions. In this paper, we propose emph{TACMA}, a underline{t}emporal-underline{a}ware language representation learning heuristic for underline{c}rowdsourced labels with underline{m}ultiple underline{a}nnotators. The proposed approach (1) explicitly models the intra-observer variability with attention mechanism; (2) computes and aggregates per-sample confidence scores from multiple workers to address the inter-observer disagreements. The proposed heuristic is extremely easy to implement in around 5 lines of code. The proposed heuristic is evaluated on four synthetic and four real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC. To encourage the reproducible results, we make our code publicly available at url{https://github.com/CrowdsourcingMining/TACMA}.
Representation learning has been proven to play an important role in the unprecedented success of machine learning models in numerous tasks, such as machine translation, face recognition and recommendation. The majority of existing representation lea
Most of the existing pre-trained language representation models neglect to consider the linguistic knowledge of texts, which can promote language understanding in NLP tasks. To benefit the downstream tasks in sentiment analysis, we propose a novel la
Many facts come with an expiration date, from the name of the President to the basketball team Lebron James plays for. But language models (LMs) are trained on snapshots of data collected at a specific moment in time, and this can limit their utility
Language representation models such as BERT could effectively capture contextual semantic information from plain text, and have been proved to achieve promising results in lots of downstream NLP tasks with appropriate fine-tuning. However, most exist
Distributed representation plays an important role in deep learning based natural language processing. However, the representation of a sentence often varies in different tasks, which is usually learned from scratch and suffers from the limited amoun