ﻻ يوجد ملخص باللغة العربية
Contextualized embeddings such as BERT can serve as strong input representations to NLP tasks, outperforming their static embeddings counterparts such as skip-gram, CBOW and GloVe. However, such embeddings are dynamic, calculated according to a sentence-level context, which limits their use in lexical semantics tasks. We address this issue by making use of dynamic embeddings as word representations in training static embeddings, thereby leveraging their strong representation power for disambiguating context information. Results show that this method leads to improvements over traditional static embeddings on a range of lexical semantics tasks, obtaining the best reported results on seven datasets.
The huge size of the widely used BERT family models has led to recent efforts about model distillation. The main goal of distillation is to create a task-agnostic pre-trained model that can be fine-tuned on downstream tasks without fine-tuning its fu
We present an approach to combining distributional semantic representations induced from text corpora with manually constructed lexical-semantic networks. While both kinds of semantic resources are available with high lexical coverage, our aligned re
Building robust natural language understanding systems will require a clear characterization of whether and how various linguistic meaning representations complement each other. To perform a systematic comparative analysis, we evaluate the mapping be
BERT has been used for solving commonsense tasks such as CommonsenseQA. While prior research has found that BERT does contain commonsense information to some extent, there has been work showing that pre-trained models can rely on spurious association
Lexical semantics theories differ in advocating that the meaning of words is represented as an inference graph, a feature mapping or a vector space, thus raising the question: is it the case that one of these approaches is superior to the others in r