No Arabic abstract
Coronavirus disease 2019 (COVID-19) has impacted almost every part of human life worldwide, posing a massive threat to human health. There is no specific drug for COVID-19, highlighting the urgent need for the development of effective therapeutics. To identify potentially repurposable drugs, we employed a systematic approach to mine candidates from U.S. FDA-approved drugs and preclinical small-molecule compounds by integrating the gene expression perturbation data for chemicals from the Library of Integrated Network-Based Cellular Signatures project with a publicly available single-cell RNA sequencing dataset from mild and severe COVID-19 patients. We identified 281 FDA-approved drugs that have the potential to be effective against SARS-CoV-2 infection, 16 of which are currently undergoing clinical trials to evaluate their efficacy against COVID-19. We experimentally tested the inhibitory effects of tyrphostin-AG-1478 and brefeldin-a on the replication of the single-stranded ribonucleic acid (ssRNA) virus influenza A virus. In conclusion, we have identified a list of repurposable anti-SARS-CoV-2 drugs using a systems biology approach.
The existence of doublets is a key confounder in single-cell RNA sequencing (scRNA-seq) data analysis. Computational methods have been developed for detecting doublets from scRNA-seq data. We developed an R package DoubletCollection to integrate the installation and execution of eight doublet-detection methods. DoubletCollection also provides a unified interface to perform and visualize downstream analysis after doublet detection. Here, we present a protocol of using DoubletCollection to benchmark doublet-detection methods. This protocol can automatically accommodate new doublet-detection methods in the fast-growing scRNA-seq field.
According to the National Cancer Institute, there were 9.5 million cancer-related deaths in 2018. A challenge in improving treatment is resistance in genetically unstable cells. The purpose of this study is to evaluate unsupervised machine learning on classifying treatment-resistant phenotypes in heterogeneous tumors through analysis of single cell RNA sequencing(scRNAseq) data with a pipeline and evaluation metrics. scRNAseq quantifies mRNA in cells and characterizes cell phenotypes. One scRNAseq dataset was analyzed (tumor/non-tumor cells of different molecular subtypes and patient identifications). The pipeline consisted of data filtering, dimensionality reduction with Principal Component Analysis, projection with Uniform Manifold Approximation and Projection, clustering with nine approaches (Ward, BIRCH, Gaussian Mixture Model, DBSCAN, Spectral, Affinity Propagation, Agglomerative Clustering, Mean Shift, and K-Means), and evaluation. Seven models divided tumor versus non-tumor cells and molecular subtype while six models classified different patient identification (13 of which were presented in the dataset); K-Means, Ward, and BIRCH often ranked highest with ~80% accuracy on the tumor versus non-tumor task and ~60% for molecular subtype and patient ID. An optimized classification pipeline using K-Means, Ward, and BIRCH models was evaluated to be most effective for further analysis. In clinical research where there is currently no standard protocol for scRNAseq analysis, clusters generated from this pipeline can be used to understand cancer cell behavior and malignant growth, directly affecting the success of treatment.
In this paper, based on the Akaike information criterion, root mean square error and robustness coefficient, a rational evaluation of various epidemic models/methods, including seven empirical functions, four statistical inference methods and five dynamical models, on their forecasting abilities is carried out. With respect to the outbreak data of COVID-19 epidemics in China, we find that before the inflection point, all models fail to make a reliable prediction. The Logistic function consistently underestimates the final epidemic size, while the Gompertzs function makes an overestimation in all cases. Towards statistical inference methods, the methods of sequential Bayesian and time-dependent reproduction number are more accurate at the late stage of an epidemic. And the transition-like behavior of exponential growth method from underestimation to overestimation with respect to the inflection point might be useful for constructing a more reliable forecast. Compared to ODE-based SIR, SEIR and SEIR-AHQ models, the SEIR-QD and SEIR-PO models generally show a better performance on studying the COVID-19 epidemics, whose success we believe could be attributed to a proper trade-off between model complexity and fitting accuracy. Our findings not only are crucial for the forecast of COVID-19 epidemics, but also may apply to other infectious diseases.
The coronavirus disease 2019 (COVID-19) pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has infected near 5 million people and led to over 0.3 million deaths. Currently, there is no specific anti-SARS-CoV-2 medication. New drug discovery typically takes more than ten years. Drug repositioning becomes one of the most feasible approaches for combating COVID-19. This work curates the largest available experimental dataset for SARS-CoV-2 or SARS-CoV main protease inhibitors. Based on this dataset, we develop validated machine learning models with relatively low root mean square error to screen 1553 FDA-approved drugs as well as other 7012 investigational or off-market drugs in DrugBank. We found that many existing drugs might be potentially potent to SARS-CoV-2. The druggability of many potent SARS-CoV-2 main protease inhibitors is analyzed. This work offers a foundation for further experimental studies of COVID-19 drug repositioning.
The development of single-cell technologies provides the opportunity to identify new cellular states and reconstruct novel cell-to-cell relationships. Applications range from understanding the transcriptional and epigenetic processes involved in metazoan development to characterizing distinct cells types in heterogeneous populations like cancers or immune cells. However, analysis of the data is impeded by its unknown intrinsic biological and technical variability together with its sparseness; these factors complicate the identification of true biological signals amidst artifact and noise. Here we show that, across technologies, roughly 95% of the eigenvalues derived from each single-cell data set can be described by universal distributions predicted by Random Matrix Theory. Interestingly, 5% of the spectrum shows deviations from these distributions and present a phenomenon known as eigenvector localization, where information tightly concentrates in groups of cells. Some of the localized eigenvectors reflect underlying biological signal, and some are simply a consequence of the sparsity of single cell data; roughly 3% is artifactual. Based on the universal distributions and a technique for detecting sparsity induced localization, we present a strategy to identify the residual 2% of directions that encode biological information and thereby denoise single-cell data. We demonstrate the effectiveness of this approach by comparing with standard single-cell data analysis techniques in a variety of examples with marked cell populations.