ﻻ يوجد ملخص باللغة العربية
Recent computational advances in the accurate prediction of protein three-dimensional (3D) structures from amino acid sequences now present a unique opportunity to decipher the interrelationships between proteins. This task entails--but is not equivalent to--a problem of 3D structure comparison and classification. Historically, protein domain classification has been a largely manual and subjective activity, relying upon various heuristics. Databases such as CATH represent significant steps towards a more systematic (and automatable) approach, yet there still remains much room for the development of more scalable and quantitative classification methods, grounded in machine learning. We suspect that re-examining these relationships via a Deep Learning (DL) approach may entail a large-scale restructuring of classification schemes, improved with respect to the interpretability of distant relationships between proteins. Here, we describe our training of DL models on protein domain structures (and their associated physicochemical properties) in order to evaluate classification properties at CATHs homologous superfamily (SF) level. To achieve this, we have devised and applied an extension of image-classification methods and image segmentation techniques, utilizing a convolutional autoencoder model architecture. Our DL architecture allows models to learn structural features that, in a sense, define different homologous SFs. We evaluate and quantify pairwise distances between SFs by building one model per SF and comparing the loss functions of the models. Hierarchical clustering on these distance matrices provides a new view of protein interrelationships--a view that extends beyond simple structural/geometric similarity, and towards the realm of structure/function properties.
Motivation: Bridging the exponentially growing gap between the number of unlabeled and labeled proteins, a couple of works have adopted semi-supervised learning for protein sequence modeling. They pre-train a model with a substantial amount of unlabe
Contact-assisted protein folding has made very good progress, but two challenges remain. One is accurate contact prediction for proteins lack of many sequence homologs and the other is that time-consuming folding simulation is often needed to predict
We consider multi-chain protein native structures and propose a criterion that determines whether two chains in the system are entangled or not. The criterion is based on the behavior observed by pulling at both temini of each chain simultaneously in
Protein-RNA interactions are of vital importance to a variety of cellular activities. Both experimental and computational techniques have been developed to study the interactions. Due to the limitation of the previous database, especially the lack of
There is great interest to develop artificial intelligence-based protein-ligand affinity models due to their immense applications in drug discovery. In this paper, PointNet and PointTransformer, two pointwise multi-layer perceptrons have been applied