ﻻ يوجد ملخص باللغة العربية
The prevalence of ambiguous acronyms make scientific documents harder to understand for humans and machines alike, presenting a need for models that can automatically identify acronyms in text and disambiguate their meaning. We introduce new methods for acronym identification and disambiguation: our acronym identification model projects learned token embeddings onto tag predictions, and our acronym disambiguation model finds training examples with similar sentence embeddings as test examples. Both of our systems achieve significant performance gains over previously suggested methods, and perform competitively on the SDU@AAAI-21 shared task leaderboard. Our models were trained in part on new distantly-supervised datasets for these tasks which we call AuxAI and AuxAD. We also identified a duplication conflict issue in the SciAD dataset, and formed a deduplicated version of SciAD that we call SciAD-dedupe. We publicly released all three of these datasets, and hope that they help the community make further strides in scientific document understanding.
Acronyms are the short forms of phrases that facilitate conveying lengthy sentences in documents and serve as one of the mainstays of writing. Due to their importance, identifying acronyms and corresponding phrases (i.e., acronym identification (AI))
Acronyms and abbreviations are the short-form of longer phrases and they are ubiquitously employed in various types of writing. Despite their usefulness to save space in writing and readers time in reading, they also provide challenges for understand
An obstacle to scientific document understanding is the extensive use of acronyms which are shortened forms of long technical phrases. Acronym disambiguation aims to find the correct meaning of an ambiguous acronym in a given text. Recent efforts att
Acronym identification focuses on finding the acronyms and the phrases that have been abbreviated, which is crucial for scientific document understanding tasks. However, the limited size of manually annotated datasets hinders further improvement for
We propose a new global entity disambiguation (ED) model based on contextualized embeddings of words and entities. Our model is based on a bidirectional transformer encoder (i.e., BERT) and produces contextualized embeddings for words and entities in