No Arabic abstract
Background: Representing biological networks as graphs is a powerful approach to reveal underlying patterns, signatures, and critical components from high-throughput biomolecular data. However, graphs do not natively capture the multi-way relationships present among genes and proteins in biological systems. Hypergraphs are generalizations of graphs that naturally model multi-way relationships and have shown promise in modeling systems such as protein complexes and metabolic reactions. In this paper we seek to understand how hypergraphs can more faithfully identify, and potentially predict, important genes based on complex relationships inferred from genomic expression data sets. Results: We compiled a novel data set of transcriptional host response to pathogenic viral infections and formulated relationships between genes as a hypergraph where hyperedges represent significantly perturbed genes, and vertices represent individual biological samples with specific experimental conditions. We find that hypergraph betweenness centrality is a superior method for identification of genes important to viral response when compared with graph centrality. Conclusions: Our results demonstrate the utility of using hypergraphs to represent complex biological systems and highlight central important responses in common to a variety of highly pathogenic viruses.
We consider the use of a running measure of power spectrum disorder to distinguish between the normal sinus rhythm of the heart and two forms of cardiac arrhythmia: atrial fibrillation and atrial flutter. This spectral entropy measure is motivated by characteristic differences in the spectra of beat timings during the three rhythms. We plot patient data derived from ten-beat windows on a disorder map and identify rhythm-defining ranges in the level and variance of spectral entropy values. Employing the spectral entropy within an automatic arrhythmia detection algorithm enables the classification of periods of atrial fibrillation from the time series of patients beats. When the algorithm is set to identify abnormal rhythms within 6 s it agrees with 85.7% of the annotations of professional rhythm assessors; for a response time of 30 s this becomes 89.5%, and with 60 s it is 90.3%. The algorithm provides a rapid way to detect atrial fibrillation, demonstrating usable response times as low as 6 s. Measures of disorder in the frequency domain have practical significance in a range of biological signals: the techniques described in this paper have potential application for the rapid identification of disorder in other rhythmic signals.
Genome-wide epistasis analysis is a powerful tool to infer gene interactions, which can guide drug and vaccine development and lead to a deeper understanding of microbial pathogenesis. We have considered all complete SARS-CoV-2 genomes deposited in the GISAID repository until textbf{four} different cut-off dates, and used Direct Coupling Analysis together with an assumption of Quasi-Linkage Equilibrium to infer epistatic contributions to fitness from polymorphic loci. We find textbf{eight} interactions, of which three between pairs where one locus lies in gene ORF3a, both loci holding non-synonymous mutations. We also find interactions between two loci in gene nsp13, both holding non-synonymous mutations, and four interactions involving one locus holding a synonymous mutation. Altogether we infer interactions between loci in viral genes ORF3a and nsp2, nsp12 and nsp6, between ORF8 and nsp4, and between loci in genes nsp2, nsp13 and nsp14. The paper opens the prospect to use prominent epistatically linked pairs as a starting point to search for combinatorial weaknesses of recombinant viral pathogens.
Counterfactual inference is a useful tool for comparing outcomes of interventions on complex systems. It requires us to represent the system in form of a structural causal model, complete with a causal diagram, probabilistic assumptions on exogenous variables, and functional assignments. Specifying such models can be extremely difficult in practice. The process requires substantial domain expertise, and does not scale easily to large systems, multiple systems, or novel system modifications. At the same time, many application domains, such as molecular biology, are rich in structured causal knowledge that is qualitative in nature. This manuscript proposes a general approach for querying a causal biological knowledge graph, and converting the qualitative result into a quantitative structural causal model that can learn from data to answer the question. We demonstrate the feasibility, accuracy and versatility of this approach using two case studies in systems biology. The first demonstrates the appropriateness of the underlying assumptions and the accuracy of the results. The second demonstrates the versatility of the approach by querying a knowledge base for the molecular determinants of a severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-induced cytokine storm, and performing counterfactual inference to estimate the causal effect of medical countermeasures for severely ill patients.
Infection by many viruses begins with fusion of viral and cellular lipid membranes, followed by entry of viral contents into the target cell and ultimately, after many biochemical steps, integration of viral DNA into that of the host cell. The early steps of membrane fusion and viral capsid entry are mediated by adsorption to the cell surface, and receptor and coreceptor binding. HIV-1 specifically targets CD4+ helper T-cells of the human immune system and binds to the receptor CD4 and coreceptor CCR5 before fusion is initiated. Previous experiments have been performed using a cell line (293-Affinofile) in which the expression of CD4 and CCR5 concentration were independently controlled. After exposure to HIV-1 of various strains, the resulting infectivity was measured through the fraction of infected cells. To design and evaluate the effectiveness of drug therapies that target the inhibition of the entry processes, an accurate functional relationship between the CD4/CCR5 concentrations and infectivity is desired in order to more quantitatively analyze experimental data. We propose three kinetic models describing the possible mechanistic processes involved in HIV entry and fit their predictions to infectivity measurements, contrasting and comparing different outcomes. Our approach allows interpretation of the clustering of infectivity of different strains of HIV-1 in the space of mechanistic kinetic parameters. Our model fitting also allows inference of nontrivial stoichiometries of receptor and coreceptor binding and provides a framework through which to quantitatively investigate the effectiveness of fusion inhibitors and neutralizing antibodies.
A popular theory of perceptual processing holds that the brain learns both a generative model of the world and a paired recognition model using variational Bayesian inference. Most hypotheses of how the brain might learn these models assume that neurons in a population are conditionally independent given their common inputs. This simplification is likely not compatible with the type of local recurrence observed in the brain. Seeking an alternative that is compatible with complex inter-dependencies yet consistent with known biology, we argue here that the cortex may learn with an adversarial algorithm. Many observable symptoms of this approach would resemble known neural phenomena, including wake/sleep cycles and oscillations that vary in magnitude with surprise, and we describe how further predictions could be tested. We illustrate the idea on recurrent neural networks trained to model image and video datasets. This framework for learning brings variational inference closer to neuroscience and yields multiple testable hypotheses.