No Arabic abstract
As proteins with similar structures often have similar functions, analysis of protein structures can help predict protein functions and is thus important. We consider the problem of protein structure classification, which computationally classifies the structures of proteins into pre-defined groups. We develop a weighted network that depicts the protein structures, and more importantly, we propose the first graphlet-based measure that applies to weighted networks. Further, we develop a deep neural network (DNN) composed of both convolutional and recurrent layers to use this measure for classification. Put together, our approach shows dramatic improvements in performance over existing graphlet-based approaches on 36 real datasets. Even comparing with the state-of-the-art approach, it almost halves the classification error. In addition to protein structure networks, our weighted-graphlet measure and DNN classifier can potentially be applied to classification of other weighted networks in computational biology as well as in other domains.
Structure determination is key to understanding protein function at a molecular level. Whilst significant advances have been made in predicting structure and function from amino acid sequence, researchers must still rely on expensive, time-consuming analytical methods to visualise detailed protein conformation. In this study, we demonstrate that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($>$3A) resolution, using a deep convolutional neural network trained on high-resolution ($leq$3A) structures represented as 2D matrices. Thus, we provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function. We investigate the impact of the input representation on classification performance, showing that side-chain information may not be necessary for fine-grained structure predictions. Finally, we confirm that high-resolution, low-resolution and NMR-determined structures inhabit a common feature space, and thus provide a theoretical foundation for boosting with single-image super-resolution.
In recent work we reported the vibrational spectrum of more than 100,000 known protein structures, and a self-consistent sonification method to render the spectrum in the audible range of frequencies (Extreme Mechanics Letters, 2019). Here we present a method to transform these molecular vibrations into materialized vibrations of thin water films using acoustic actuators, leading to complex patterns of surface waves, and using the resulting macroscopic images in further processing using deep convolutional neural networks. Specifically, the patterns of water surface waves for each protein structure is used to build training sets for neural networks, aimed to classify and further process the patterns. Once trained, the neural network model is capable of discerning different proteins solely by analyzing the macroscopic surface wave patterns in the water film. Not only can the method distinguish different types of proteins (e.g. alpha-helix vs hybrids of alpha-helices and beta-sheets), but it is also capable of determining different folding states of the same protein, or the binding events of proteins to ligands. Using the DeepDream algorithm, instances of key features of the deep neural network can be made visible in a range of images, allowing us to explore the inner workings of protein surface wave patter neural networks, as well as the creation of new images by finding and highlighting features of protein molecular spectra in a range of photographic input. The integration of the water-focused realization of cymatics, combined with neural networks and especially generative methods, offer a new direction to realize materiomusical Inceptionism as a possible direction in nano-inspired art. The method could have applications for detecting different protein structures, the effect of mutations, or uses in medical imaging and diagnostics, with broad impact in nano-to-macro transitions.
Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.
We estimate the Lipschitz constants of the gradient of a deep neural network and the network itself with respect to the full set of parameters. We first develop estimates for a deep feed-forward densely connected network and then, in a more general framework, for all neural networks that can be represented as solutions of controlled ordinary differential equations, where time appears as continuous depth. These estimates can be used to set the step size of stochastic gradient descent methods, which is illustrated for one example method.
Deep Gaussian processes (DGPs) have struggled for relevance in applications due to the challenges and cost associated with Bayesian inference. In this paper we propose a sparse variational approximation for DGPs for which the approximate posterior mean has the same mathematical structure as a Deep Neural Network (DNN). We make the forward pass through a DGP equivalent to a ReLU DNN by finding an interdomain transformation that represents the GP posterior mean as a sum of ReLU basis functions. This unification enables the initialisation and training of the DGP as a neural network, leveraging the well established practice in the deep learning community, and so greatly aiding the inference task. The experiments demonstrate improved accuracy and faster training compared to current DGP methods, while retaining favourable predictive uncertainties.