ترغب بنشر مسار تعليمي؟ اضغط هنا

Variational Autoencoder for Anti-Cancer Drug Response Prediction

100   0   0.0 ( 0 )
 نشر من قبل Jiaqing Xie
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Cancer is a primary cause of human death, but discovering drugs and tailoring cancer therapies are expensive and time-consuming. We seek to facilitate the discovery of new drugs and treatment strategies for cancer using variational autoencoders (VAEs) and multi-layer perceptrons (MLPs) to predict anti-cancer drug responses. Our model takes as input gene expression data of cancer cell lines and anti-cancer drug molecular data and encodes these data with our {sc {GeneVae}} model, which is an ordinary VAE model, and a rectified junction tree variational autoencoder ({sc JTVae}) model, respectively. A multi-layer perceptron processes these encoded features to produce a final prediction. Our tests show our system attains a high average coefficient of determination ($R^{2} = 0.83$) in predicting drug responses for breast cancer cell lines and an average $R^{2} = 0.845$ for pan-cancer cell lines. Additionally, we show that our model can generates effective drug compounds not previously used for specific cancer cell lines.

قيم البحث

اقرأ أيضاً

Interaction between pharmacological agents can trigger unexpected adverse events. Capturing richer and more comprehensive information about drug-drug interactions (DDI) is one of the key tasks in public health and drug development. Recently, several knowledge graph embedding approaches have received increasing attention in the DDI domain due to their capability of projecting drugs and interactions into a low-dimensional feature space for predicting links and classifying triplets. However, existing methods only apply a uniformly random mode to construct negative samples. As a consequence, these samples are often too simplistic to train an effective model. In this paper, we propose a new knowledge graph embedding framework by introducing adversarial autoencoders (AAE) based on Wasserstein distances and Gumbel-Softmax relaxation for drug-drug interactions tasks. In our framework, the autoencoder is employed to generate high-quality negative samples and the hidden vector of the autoencoder is regarded as a plausible drug candidate. Afterwards, the discriminator learns the embeddings of drugs and interactions based on both positive and negative triplets. Meanwhile, in order to solve vanishing gradient problems on the discrete representation--an inherent flaw in traditional generative models--we utilize the Gumbel-Softmax relaxation and the Wasserstein distance to train the embedding model steadily. We empirically evaluate our method on two tasks, link prediction and DDI classification. The experimental results show that our framework can attain significant improvements and noticeably outperform competitive baselines.
Motivated by the size of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating data, a common question is whether the proposed predictors can further improve the generalization performance with more training data. We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these predictors. The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, suggesting that the shape of these curves depends on the unique model-dataset pair. The multi-input NN (mNN), in which gene expressions and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training sizes for two of the datasets, whereas the mNN performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate predictors, providing a broader perspective on the overall data scaling characteristics. The fitted power law curves provide a forward-looking performance metric and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments.
In the past several months, COVID-19 has spread over the globe and caused severe damage to the people and the society. In the context of this severe situation, an effective drug discovery method to generate potential drugs is extremely meaningful. In this paper, we provide a methodology of discovering potential drugs for the treatment of Severe Acute Respiratory Syndrome Corona-Virus 2 (commonly known as SARS-CoV-2). We proposed a new model called Genetic Constrained Graph Variational Autoencoder (GCGVAE) to solve this problem. We trained our model based on the data of various viruses protein structure, including that of the SARS, HIV, Hep3, and MERS, and used it to generate possible drugs for SARS-CoV-2. Several optimization algorithms, including valency masking and genetic algorithm, are deployed to fine tune our model. According to the simulation, our generated molecules have great effectiveness in inhibiting SARS-CoV-2. We quantitatively calculated the scores of our generated molecules and compared it with the scores of existing drugs, and the result shows our generated molecules scores much better than those existing drugs. Moreover, our model can be also applied to generate effective drugs for treating other viruses given their protein structure, which could be used to generate drugs for future viruses.
To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross validation within a single study to assess model accuracy. While an essential first step, cross validation within a biological data set typically provides an overly optimistic estimate of the prediction performance on independent test sets. To provide a more rigorous assessment of model generalizability between different studies, we use machine learning to analyze five publicly available cell line-based data sets: NCI60, CTRP, GDSC, CCLE and gCSI. Based on observed experimental variability across studies, we explore estimates of prediction upper bounds. We report performance results of a variety of machine learning models, with a multitasking deep neural network achieving the best cross-study generalizability. By multiple measures, models trained on CTRP yield the most accurate predictions on the remaining testing data, and gCSI is the most predictable among the cell line data sets included in this study. With these experiments and further simulations on partial data, two lessons emerge: (1) differences in viability assays can limit model generalizability across studies, and (2) drug diversity, more than tumor diversity, is crucial for raising model generalizability in preclinical screening.
Molecule generation is to design new molecules with specific chemical properties and further to optimize the desired chemical properties. Following previous work, we encode molecules into continuous vectors in the latent space and then decode the vec tors into molecules under the variational autoencoder (VAE) framework. We investigate the posterior collapse problem of current RNN-based VAEs for molecule sequence generation. For the first time, we find that underestimated reconstruction loss leads to posterior collapse, and provide both theoretical and experimental evidence. We propose an effective and efficient solution to fix the problem and avoid posterior collapse. Without bells and whistles, our method achieves SOTA reconstruction accuracy and competitive validity on the ZINC 250K dataset. When generating 10,000 unique valid SMILES from random prior sampling, it costs JT-VAE1450s while our method only needs 9s. Our implementation is at https://github.com/chaoyan1037/Re-balanced-VAE.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا