ﻻ يوجد ملخص باللغة العربية
Sentence embeddings have become an essential part of todays natural language processing (NLP) systems, especially together advanced deep learning methods. Although pre-trained sentence encoders are available in the general domain, none exists for biomedical texts to date. In this work, we introduce BioSentVec: the first open set of sentence embeddings trained with over 30 million documents from both scholarly articles in PubMed and clinical notes in the MIMIC-III Clinical Database. We evaluate BioSentVec embeddings in two sentence pair similarity tasks in different text genres. Our benchmarking results demonstrate that the BioSentVec embeddings can better capture sentence semantics compared to the other competitive alternatives and achieve state-of-the-art performance in both tasks. We expect BioSentVec to facilitate the research and development in biomedical text mining and to complement the existing resources in biomedical word embeddings. BioSentVec is publicly available at https://github.com/ncbi-nlp/BioSentVec
We present an analysis of the problem of identifying biological context and associating it with biochemical events in biomedical texts. This constitutes a non-trivial, inter-sentential relation extraction task. We focus on biological context as descr
Capturing the semantics of related biological concepts, such as genes and mutations, is of significant importance to many research tasks in computational biology such as protein-protein interaction detection, gene-drug association prediction, and bio
Machine translation is highly sensitive to the size and quality of the training data, which has led to an increasing interest in collecting and filtering large parallel corpora. In this paper, we propose a new method for this task based on multilingu
This paper introduces a sentence to vector encoding framework suitable for advanced natural language processing. Our latent representation is shown to encode sentences with common semantic information with similar vector representations. The vector r
Biomedical Named Entity Recognition (BioNER) is a crucial step for analyzing Biomedical texts, which aims at extracting biomedical named entities from a given text. Different supervised machine learning algorithms have been applied for BioNER by vari