No Arabic abstract
Protein contacts contain important information for protein structure and functional study, but contact prediction from sequence information remains very challenging. Recently evolutionary coupling (EC) analysis, which predicts contacts by detecting co-evolved residues (or columns) in a multiple sequence alignment (MSA), has made good progress due to better statistical assessment techniques and high-throughput sequencing. Existing EC analysis methods predict only a single contact map for a given protein, which may have low accuracy especially when the protein under prediction does not have a large number of sequence homologs. Analogous to ab initio folding that usually predicts a few possible 3D models for a given protein sequence, this paper presents a novel structure learning method that can predict a set of diverse contact maps for a given protein sequence, in which the best solution usually has much better accuracy than the first one. Our experimental tests show that for many test proteins, the best out of 5 solutions generated by our method has accuracy at least 0.1 better than the first one when the top L/5 or L/10 (L is the sequence length) predicted long-range contacts are evaluated, especially for protein families with a small number of sequence homologs. Our best solutions also have better quality than those generated by the two popular EC methods Evfold and PSICOV.
Proteins are an important class of biomolecules that serve as essential building blocks of the cells. Their three-dimensional structures are responsible for their functions. In this thesis we have investigated the protein structures using a network theoretical approach. While doing so we used a coarse-grained method, viz., complex network analysis. We model protein structures at two length scales as Protein Contact Networks (PCN) and as Long-range Interaction Networks (LINs). We found that proteins by virtue of being characterised by high amount of clustering, are small-world networks. Apart from the small-world nature, we found that proteins have another general property, viz., assortativity. This is an interesting and exceptional finding as all other complex networks (except for social networks) are known to be disassortative. Importantly, we could identify one of the major topological determinant of assortativity by building appropriate controls.
In this paper, we study a two-lane totally asymmetric simple exclusion process (TASEP) coupled with random attachment and detachment of particles (Langmuir kinetics) in both lanes under open boundary conditions. Our model can describe the directed motion of molecular motors, attachment and detachment of motors, and free inter-lane transition of motors between filaments. In this paper, we focus on some finite-size effects of the system because normally the sizes of most real systems are finite and small (e.g., size $leq 10,000$). A special finite-size effect of the two-lane system has been observed, which is that the density wall moves left first and then move towards the right with the increase of the lane-changing rate. We called it the jumping effect. We find that increasing attachment and detachment rates will weaken the jumping effect. We also confirmed that when the size of the two-lane system is large enough, the jumping effect disappears, and the two-lane system has a similar density profile to a single-lane TASEP coupled with Langmuir kinetics. Increasing lane-changing rates has little effect on density and current after the density reaches maximum. Also, lane-changing rate has no effect on density profiles of a two-lane TASEP coupled with Langmuir kinetics at a large attachment/detachment rate and/or a large system size. Mean-field approximation is presented and it agrees with our Monte Carlo simulations.
Global coronavirus disease pandemic (COVID-19) caused by newly identified SARS- CoV-2 coronavirus continues to claim the lives of thousands of people worldwide. The unavailability of specific medications to treat COVID-19 has led to drug repositioning efforts using various approaches, including computational analyses. Such analyses mostly rely on molecular docking and require the 3D structure of the target protein to be available. In this study, we utilized a set of machine learning algorithms and trained them on a dataset of RNA-dependent RNA polymerase (RdRp) inhibitors to run inference analyses on antiviral and anti-inflammatory drugs solely based on the ligand information. We also performed virtual screening analysis of the drug candidates predicted by machine learning models and docked them against the active site of SARS- CoV-2 RdRp, a key component of the virus replication machinery. Based on the ligand information of RdRp inhibitors, the machine learning models were able to identify candidates such as remdesivir and baloxavir marboxil, molecules with documented activity against RdRp of the novel coronavirus. Among the other identified drug candidates were beclabuvir, a non-nucleoside inhibitor of the hepatitis C virus (HCV) RdRp enzyme, and HCV protease inhibitors paritaprevir and faldaprevir. Further analysis of these candidates using molecular docking against the SARS-CoV-2 RdRp revealed low binding energies against the enzyme active site. Our approach also identified anti-inflammatory drugs lupeol, lifitegrast, antrafenine, betulinic acid, and ursolic acid to have potential activity against SARS-CoV-2 RdRp. We propose that the results of this study are considered for further validation as potential therapeutic options against COVID-19.
The twenty protein coding amino acids are found in proteomes with different relative abundances. The most abundant amino acid, leucine, is nearly an order of magnitude more prevalent than the least abundant amino acid, cysteine. Amino acid metabolic costs differ similarly, constraining their incorporation into proteins. On the other hand, sequence diversity is necessary for protein folding, function and evolution. Here we present a simple model for a cost-diversity trade-off postulating that natural proteomes minimize amino acid metabolic flux while maximizing sequence entropy. The model explains the relative abundances of amino acids across a diverse set of proteomes. We found that the data is remarkably well explained when the cost function accounts for amino acid chemical decay. More than one hundred proteomes reach comparable solutions to the trade-off by different combinations of cost and diversity. Quantifying the interplay between proteome size and entropy shows that proteomes can get optimally large and diverse.