No Arabic abstract
Intercellular heterogeneity is a major obstacle to successful personalized medicine. Single-cell RNA sequencing (scRNA-seq) technology has enabled in-depth analysis of intercellular heterogeneity in various diseases. However, its full potentials for personalized medicine are yet to be reached. Towards this, we propose A Single-cell Guided pipeline to Aid Repurposing of Drugs (ASGARD). ASGARD can repurpose single drugs for each cell cluster and for multiple cell clusters at individual patient levels; it can also predict personalized drug combinations to address the intercellular heterogeneity within each patient. We tested ASGARD on three independent datasets, including advanced metastatic breast cancer, acute lymphoblastic leukemia, and coronavirus disease 2019 (COVID-19). On single-drug therapy, ASGARD shows significantly better average accuracy (AUC=0.95) compared to two other single-cell pipelines (AUC 0.69 and 0.57) and two other bulk-cell-based drug repurposing methods (AUC 0.80 and 0.75). The top-ranked drugs, such as fulvestrant and neratinib for breast cancer, tretinoin and vorinostat for leukemia, and chloroquine and enalapril for severe COVID19, are either approved by FDA or in clinical trials treating corresponding diseases. In conclusion, ASGARD is a promising pipeline guided by single-cell RNA-seq data, for repurposing personalized drugs and drug combinations. ASGARD is free for academic use at https://github.com/lanagarmire/ASGARD.
The ability to quickly detect transient sources in optical images and trigger multi-wavelength follow up is key for the discovery of fast transients. These include events rare and difficult to detect such as kilonovae, supernova shock breakout, and orphan Gamma-ray Burst afterglows. We present the Mary pipeline, a (mostly) automated tool to discover transients during high-cadenced observations with the Dark Energy Camera (DECam) at CTIO. The observations are part of the Deeper Wider Faster program, a multi-facility, multi-wavelength program designed to discover fast transients, including counterparts to Fast Radio Bursts and gravitational waves. Our tests of the Mary pipeline on DECam images return a false positive rate of ~2.2% and a missed fraction of ~3.4% obtained in less than 2 minutes, which proves the pipeline to be suitable for rapid and high-quality transient searches. The pipeline can be adapted to search for transients in data obtained with imagers other than DECam.
The development of single-cell technologies provides the opportunity to identify new cellular states and reconstruct novel cell-to-cell relationships. Applications range from understanding the transcriptional and epigenetic processes involved in metazoan development to characterizing distinct cells types in heterogeneous populations like cancers or immune cells. However, analysis of the data is impeded by its unknown intrinsic biological and technical variability together with its sparseness; these factors complicate the identification of true biological signals amidst artifact and noise. Here we show that, across technologies, roughly 95% of the eigenvalues derived from each single-cell data set can be described by universal distributions predicted by Random Matrix Theory. Interestingly, 5% of the spectrum shows deviations from these distributions and present a phenomenon known as eigenvector localization, where information tightly concentrates in groups of cells. Some of the localized eigenvectors reflect underlying biological signal, and some are simply a consequence of the sparsity of single cell data; roughly 3% is artifactual. Based on the universal distributions and a technique for detecting sparsity induced localization, we present a strategy to identify the residual 2% of directions that encode biological information and thereby denoise single-cell data. We demonstrate the effectiveness of this approach by comparing with standard single-cell data analysis techniques in a variety of examples with marked cell populations.
The efficacy of a drug depends on its binding affinity to the therapeutic target and pharmacokinetics. Deep learning (DL) has demonstrated remarkable progress in predicting drug efficacy. We develop MolDesigner, a human-in-the-loop web user-interface (UI), to assist drug developers leverage DL predictions to design more effective drugs. A developer can draw a drug molecule in the interface. In the backend, more than 17 state-of-the-art DL models generate predictions on important indices that are crucial for a drugs efficacy. Based on these predictions, drug developers can edit the drug molecule and reiterate until satisfaction. MolDesigner can make predictions in real-time with a latency of less than a second.
Mathematical methods of information theory constitute essential tools to describe how stimuli are encoded in activities of signaling effectors. Exploring the information-theoretic perspective, however, remains conceptually, experimentally and computationally challenging. Specifically, existing computational tools enable efficient analysis of relatively simple systems, usually with one input and output only. Moreover, their robust and readily applicable implementations are missing. Here, we propose a novel algorithm to analyze signaling data within the framework of information theory. Our approach enables robust as well as statistically and computationally efficient analysis of signaling systems with high-dimensional outputs and a large number of input values. Analysis of the NF-kB single - cell signaling responses to TNF-a uniquely reveals that the NF-kB signaling dynamics improves discrimination of high concentrations of TNF-a with a modest impact on discrimination of low concentrations. Our readily applicable R-package, SLEMI - statistical learning based estimation of mutual information, allows the approach to be used by computational biologists with only elementary knowledge of information theory.
Despite substantial potential to transform bioscience, medicine, and bioengineering, whole-cell models remain elusive. One of the biggest challenges to whole-cell models is assembling the large and diverse array of data needed to model an entire cell. Thanks to rapid advances in experimentation, much of the necessary data is becoming available. Furthermore, investigators are increasingly sharing their data due to increased emphasis on reproducibility. However, the scattered organization of this data continues to hamper modeling. Toward more predictive models, we highlight the challenges to assembling the data needed for whole-cell modeling and outline how we can overcome these challenges by working together to build a central data warehouse.