No Arabic abstract
The Machine Recognition of Crystallization Outcomes (MARCO) initiative has assembled roughly half a million annotated images of macromolecular crystallization experiments from various sources and setups. Here, state-of-the-art machine learning algorithms are trained and tested on different parts of this data set. We find that more than 94% of the test images can be correctly labeled, irrespective of their experimental origin. Because crystal recognition is key to high-density screening and the systematic analysis of crystallization experiments, this approach opens the door to both industrial and fundamental research applications.
Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.
Recently, deep convolutional neural networks have shown good results for image recognition. In this paper, we use convolutional neural networks with a finder module, which discovers the important region for recognition and extracts that region. We propose applying our method to the recognition of protein crystals for X-ray structural analysis. In this analysis, it is necessary to recognize states of protein crystallization from a large number of images. There are several methods that realize protein crystallization recognition by using convolutional neural networks. In each method, large-scale data sets are required to recognize with high accuracy. In our data set, the number of images is not good enough for training CNN. The amount of data for CNN is a serious issue in various fields. Our method realizes high accuracy recognition with few images by discovering the region where the crystallization drop exists. We compared our crystallization image recognition method with a high precision method using Inception-V3. We demonstrate that our method is effective for crystallization images using several experiments. Our method gained the AUC value that is about 5% higher than the compared method.
A deep neural network based architecture was constructed to predict amino acid side chain conformation with unprecedented accuracy. Amino acid side chain conformation prediction is essential for protein homology modeling and protein design. Current widely-adopted methods use physics-based energy functions to evaluate side chain conformation. Here, using a deep neural network architecture without physics-based assumptions, we have demonstrated that side chain conformation prediction accuracy can be improved by more than 25%, especially for aromatic residues compared with current standard methods. More strikingly, the prediction method presented here is robust enough to identify individual conformational outliers from high resolution structures in a protein data bank without providing its structural factors. We envisage that our amino acid side chain predictor could be used as a quality check step for future protein structure model validation and many other potential applications such as side chain assignment in Cryo-electron microscopy, crystallography model auto-building, protein folding and small molecule ligand docking.
We present a content-based automatic music tagging algorithm using fully convolutional neural networks (FCNs). We evaluate different architectures consisting of 2D convolutional layers and subsampling layers only. In the experiments, we measure the AUC-ROC scores of the architectures with different complexities and input types using the MagnaTagATune dataset, where a 4-layer architecture shows state-of-the-art performance with mel-spectrogram input. Furthermore, we evaluated the performances of the architectures with varying the number of layers on a larger dataset (Million Song Dataset), and found that deeper models outperformed the 4-layer architecture. The experiments show that mel-spectrogram is an effective time-frequency representation for automatic tagging and that more complex models benefit from more training data.
Classification of polarimetric synthetic aperture radar (PolSAR) images is an active research area with a major role in environmental applications. The traditional Machine Learning (ML) methods proposed in this domain generally focus on utilizing highly discriminative features to improve the classification performance, but this task is complicated by the well-known curse of dimensionality phenomena. Other approaches based on deep Convolutional Neural Networks (CNNs) have certain limitations and drawbacks, such as high computational complexity, an unfeasibly large training set with ground-truth labels, and special hardware requirements. In this work, to address the limitations of traditional ML and deep CNN based methods, a novel and systematic classification framework is proposed for the classification of PolSAR images, based on a compact and adaptive implementation of CNNs using a sliding-window classification approach. The proposed approach has three advantages. First, there is no requirement for an extensive feature extraction process. Second, it is computationally efficient due to utilized compact configurations. In particular, the proposed compact and adaptive CNN model is designed to achieve the maximum classification accuracy with minimum training and computational complexity. This is of considerable importance considering the high costs involved in labelling in PolSAR classification. Finally, the proposed approach can perform classification using smaller window sizes than deep CNNs. Experimental evaluations have been performed over the most commonly-used four benchmark PolSAR images: AIRSAR L-Band and RADARSAT-2 C-Band data of San Francisco Bay and Flevoland areas. Accordingly, the best obtained overall accuracies range between 92.33 - 99.39% for these benchmark study sites.