Transformer Query-Target Knowledge Discovery (TEND): Drug Discovery from CORD-19

279 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Leo Tam

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Leo K. Tam - Xiaosong Wang - Daguang Xu

الحساب واللغة استرجاع المعلومات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Previous work established skip-gram word2vec models could be used to mine knowledge in the materials science literature for the discovery of thermoelectrics. Recent transformer architectures have shown great progress in language modeling and associated fine-tuned tasks, but they have yet to be adapted for drug discovery. We present a RoBERTa transformer-based method that extends the masked language token prediction using query-target conditioning to treat the specificity challenge. The transformer discovery method entails several benefits over the word2vec method including domain-specific (antiviral) analogy performance, negation handling, and flexible query analysis (specific) and is demonstrated on influenza drug discovery. To stimulate COVID-19 research, we release an influenza clinical trials and antiviral analogies dataset used in conjunction with the COVID-19 Open Research Dataset Challenge (CORD-19) literature dataset in the study. We examine k-shot fine-tuning to improve the downstream analogies performance as well as to mine analogies for model explainability. Further, the query-target analysis is verified in a forward chaining analysis against the influenza drug clinical trials dataset, before adapted for COVID-19 drugs (combinations and side-effects) and on-going clinical trials. In consideration of the present topic, we release the model, dataset, and code.

قيم البحث

101 - Rui Zhang , Dimitar Hristovski , Dalton Schutte 2020

Objective: To discover candidate drugs to repurpose for COVID-19 using literature-derived knowledge and knowledge graph completion methods. Methods: We propose a novel, integrative, and neural network-based literature-based discovery (LBD) approach t o identify drug candidates from both PubMed and COVID-19-focused research literature. Our approach relies on semantic triples extracted using SemRep (via SemMedDB). We identified an informative subset of semantic triples using filtering rules and an accuracy classifier developed on a BERT variant, and used this subset to construct a knowledge graph. Five SOTA, neural knowledge graph completion algorithms were used to predict drug repurposing candidates. The models were trained and assessed using a time slicing approach and the predicted drugs were compared with a list of drugs reported in the literature and evaluated in clinical trials. These models were complemented by a discovery pattern-based approach. Results: Accuracy classifier based on PubMedBERT achieved the best performance (F1= 0.854) in classifying semantic predications. Among five knowledge graph completion models, TransE outperformed others (MR = 0.923, Hits@1=0.417). Some known drugs linked to COVID-19 in the literature were identified, as well as some candidate drugs that have not yet been studied. Discovery patterns enabled generation of plausible hypotheses regarding the relationships between the candidate drugs and COVID-19. Among them, five highly ranked and novel drugs (paclitaxel, SB 203580, alpha 2-antiplasmin, pyrrolidine dithiocarbamate, and butylated hydroxytoluene) with their mechanistic explanations were further discussed. Conclusion: We show that an LBD approach can be feasible for discovering drug candidates for COVID-19, and for generating mechanistic explanations. Our approach can be generalized to other diseases as well as to other clinical questions.

الحساب واللغة استرجاع المعلومات

Extracting a Knowledge Base of Mechanisms from COVID-19 Papers

227 - Tom Hope , Aida Amini , David Wadden 2020

The COVID-19 pandemic has spawned a diverse body of scientific literature that is challenging to navigate, stimulating interest in automated tools to help find useful knowledge. We pursue the construction of a knowledge base (KB) of mechanisms -- a f undamental concept across the sciences encompassing activities, functions and causal relations, ranging from cellular processes to economic impacts. We extract this information from the natural language of scientific papers by developing a broad, unified schema that strikes a balance between relevance and breadth. We annotate a dataset of mechanisms with our schema and train a model to extract mechanism relations from papers. Our experiments demonstrate the utility of our KB in supporting interdisciplinary scientific search over COVID-19 literature, outperforming the prominent PubMed search in a study with clinical experts.

الحساب واللغة استرجاع المعلومات التعلم الآلي

Two Stage Transformer Model for COVID-19 Fake News Detection and Fact Checking

151 - Rutvik Vijjali , Prathyush Potluri , Siddharth Kumar 2020

The rapid advancement of technology in online communication via social media platforms has led to a prolific rise in the spread of misinformation and fake news. Fake news is especially rampant in the current COVID-19 pandemic, leading to people belie ving in false and potentially harmful claims and stories. Detecting fake news quickly can alleviate the spread of panic, chaos and potential health hazards. We developed a two stage automated pipeline for COVID-19 fake news detection using state of the art machine learning models for natural language processing. The first model leverages a novel fact checking algorithm that retrieves the most relevant facts concerning user claims about particular COVID-19 claims. The second model verifies the level of truth in the claim by computing the textual entailment between the claim and the true facts retrieved from a manually curated COVID-19 dataset. The dataset is based on a publicly available knowledge source consisting of more than 5000 COVID-19 false claims and verified explanations, a subset of which was internally annotated and cross-validated to train and evaluate our models. We evaluate a series of models based on classical text-based features to more contextual Transformer based models and observe that a model pipeline based on BERT and ALBERT for the two stages respectively yields the best results.

الحساب واللغة استرجاع المعلومات التعلم الآلي

CAiRE-COVID: A Question Answering and Query-focused Multi-Document Summarization System for COVID-19 Scholarly Information Management

130 - Dan Su , Yan Xu , Tiezheng Yu 2020

We present CAiRE-COVID, a real-time question answering (QA) and multi-document summarization system, which won one of the 10 tasks in the Kaggle COVID-19 Open Research Dataset Challenge, judged by medical experts. Our system aims to tackle the recent challenge of mining the numerous scientific articles being published on COVID-19 by answering high priority questions from the community and summarizing salient question-related information. It combines information extraction with state-of-the-art QA and query-focused multi-document summarization techniques, selecting and highlighting evidence snippets from existing literature given a query. We also propose query-focused abstractive and extractive multi-document summarization methods, to provide more relevant information related to the question. We further conduct quantitative experiments that show consistent improvements on various metrics for each module. We have launched our website CAiRE-COVID for broader use by the medical community, and have open-sourced the code for our system, to bootstrap further study by other researches.

الحساب واللغة استرجاع المعلومات التعلم الآلي

Genetic Constrained Graph Variational Autoencoder for COVID-19 Drug Discovery

108 - Tianyue Cheng , Tianchi Fan , Landi Wang 2021

In the past several months, COVID-19 has spread over the globe and caused severe damage to the people and the society. In the context of this severe situation, an effective drug discovery method to generate potential drugs is extremely meaningful. In this paper, we provide a methodology of discovering potential drugs for the treatment of Severe Acute Respiratory Syndrome Corona-Virus 2 (commonly known as SARS-CoV-2). We proposed a new model called Genetic Constrained Graph Variational Autoencoder (GCGVAE) to solve this problem. We trained our model based on the data of various viruses protein structure, including that of the SARS, HIV, Hep3, and MERS, and used it to generate possible drugs for SARS-CoV-2. Several optimization algorithms, including valency masking and genetic algorithm, are deployed to fine tune our model. According to the simulation, our generated molecules have great effectiveness in inhibiting SARS-CoV-2. We quantitatively calculated the scores of our generated molecules and compared it with the scores of existing drugs, and the result shows our generated molecules scores much better than those existing drugs. Moreover, our model can be also applied to generate effective drugs for treating other viruses given their protein structure, which could be used to generate drugs for future viruses.

التعلم الآلي الأساليب الكمية