No Arabic abstract
In medical sciences, a biomarker is a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention. Molecular experiments are providing rapid and systematic approaches to search for biomarkers, but because single-molecule biomarkers have shown a disappointing lack of robustness for clinical diagnosis, researchers have begun searching for distinctive sets of molecules, called biosignatures. However, the most popular statistics are not appropriate for their identification, and the number of possible biosignatures to be tested is frequently intractable. In the present work, we developed a multivariate filter using genetic algorithms (GA) as a feature (gene) selector to optimize a measure of intra-group cohesion and inter-group dispersion. This method was implemented using Python and R (pyBioSig, available at https://github.com/fredgca/pybiosig under LGPL) and can be manipulated via graphical interface or Python scripts. Using it, we were able to identify putative biosignatures composed by just a few genes and capable of recovering multiple groups simultaneously in a hierarchical clustering, even ones that were not recovered using the whole transcriptome, within a feasible length of time using a personal computer. Our results allowed us to conclude that using GA to optimize our new intra-group cohesion and inter-group dispersion measure is a clear, effective, and computationally feasible strategy for the identification of putative omical biosignatures that could support discrimination among multiple groups simultaneously.
We introduce the method of using an annealing genetic algorithm to the numerically complex problem of looking for quantum logic gates which simultaneously have highest fidelity and highest success probability. We first use the linear optical quantum nonlinear sign (NS) gate as an example to illustrate the efficiency of this method. We show that by appropriately choosing the annealing parameters, we can reach the theoretical maximum success probability (1/4 for NS) for each attempt. We then examine the controlled-z (CZ) gate as the first new problem to be solved. We show results that agree with the highest known maximum success probability for a CZ gate (2/27) while maintaining a fidelity of 0.9997. Since the purpose of our algorithm is to optimize a unitary matrix for quantum transformations, it could easily be applied to other areas of interest such as quantum optics and quantum sensors.
Recent technological advances in Next Generation Sequencing tools have led to increasing speeds of DNA sample collection, preparation, and sequencing. One instrument can produce over 600 Gb of genetic sequence data in a single run. This creates new opportunities to efficiently handle the increasing workload. We propose a new method of fast genetic sequence analysis using the Dynamic Distributed Dimensional Data Model (D4M) - an associative array environment for MATLAB developed at MIT Lincoln Laboratory. Based on mathematical and statistical properties, the method leverages big data techniques and the implementation of an Apache Acculumo database to accelerate computations one-hundred fold over other methods. Comparisons of the D4M method with the current gold-standard for sequence analysis, BLAST, show the two are comparable in the alignments they find. This paper will present an overview of the D4M genetic sequence algorithm and statistical comparisons with BLAST.
In previous work, a novel supervised framework implementing a binary classifier was presented that obtained excellent results for side effect discovery. Interestingly, unique side effects were identified when different binary classifiers were used within the framework, prompting the investigation of applying a multiple classifier system. In this paper we investigate tuning a side effect multiple classifying system using genetic algorithms. The results of this research show that the novel framework implementing a multiple classifying system trained using genetic algorithms can obtain a higher partial area under the receiver operating characteristic curve than implementing a single classifier. Furthermore, the framework is able to detect side effects efficiently and obtains a low false positive rate.
Amid the pandemic of 2019 novel coronavirus disease (COVID-19) infected by SARS-CoV-2, a vast amount of drug research for prevention and treatment has been quickly conducted, but these efforts have been unsuccessful thus far. Our objective is to prioritize repurposable drugs using a drug repurposing pipeline that systematically integrates multiple SARS-CoV-2 and drug interactions, deep graph neural networks, and in-vitro/population-based validations. We first collected all the available drugs (n= 3,635) involved in COVID-19 patient treatment through CTDbase. We built a SARS-CoV-2 knowledge graph based on the interactions among virus baits, host genes, pathways, drugs, and phenotypes. A deep graph neural network approach was used to derive the candidate representation based on the biological interactions. We prioritized the candidate drugs using clinical trial history, and then validated them with their genetic profiles, in vitro experimental efficacy, and electronic health records. We highlight the top 22 drugs including Azithromycin, Atorvastatin, Aspirin, Acetaminophen, and Albuterol. We further pinpointed drug combinations that may synergistically target COVID-19. In summary, we demonstrated that the integration of extensive interactions, deep neural networks, and rigorous validation can facilitate the rapid identification of candidate drugs for COVID-19 treatment.
Automation is becoming ubiquitous in all laboratory activities, leading towards precisely defined and codified laboratory protocols. However, the integration between laboratory protocols and mathematical models is still lacking. Models describe physical processes, while protocols define the steps carried out during an experiment: neither cover the domain of the other, although they both attempt to characterize the same phenomena. We should ideally start from an integrated description of both the model and the steps carried out to test it, to concurrently analyze uncertainties in model parameters, equipment tolerances, and data collection. To this end, we present a language to model and optimize experimental biochemical protocols that facilitates such an integrated description, and that can be combined with experimental data. We provide a probabilistic semantics for our language based on a Bayesian interpretation that formally characterizes the uncertainties in both the data collection, the underlying model, and the protocol operations. On a set of case studies we illustrate how the resulting framework allows for automated analysis and optimization of experimental protocols, including Gibson assembly protocols.