piSAAC: Extended notion of SAAC feature selection novel method for discrimination of Enzymes model using different machine learning algorithm

34 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zaheer Ullah Khan

تاريخ النشر 2020

مجال البحث علم الأحياء الهندسة المعلوماتية

والبحث باللغة English

تأليف Zaheer Ullah Khan - Dechang Pi - Izhar Ahmed Khan

الجزيئات الحيوية التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Enzymes and proteins are live driven biochemicals, which has a dramatic impact over the environment, in which it is active. So, therefore, it is highly looked-for to build such a robust and highly accurate automatic and computational model to accurately predict enzymes nature. In this study, a novel split amino acid composition model named piSAAC is proposed. In this model, protein sequence is discretized in equal and balanced terminus to fully evaluate the intrinsic correlation properties of the sequence. Several state-of-the-art algorithms have been employed to evaluate the proposed model. A 10-folds cross-validation evaluation is used for finding out the authenticity and robust-ness of the model using different statistical measures e.g. Accuracy, sensitivity, specificity, F-measure and area un-der ROC curve. The experimental results show that, probabilistic neural network algorithm with piSAAC feature extraction yields an accuracy of 98.01%, sensitivity of 97.12%, specificity of 95.87%, f-measure of 0.9812and AUC 0.95812, over dataset S1, accuracy of 97.85%, sensitivity of 97.54%, specificity of 96.24%, f-measure of 0.9774 and AUC 0.9803 over dataset S2. Evident from these excellent empirical results, the proposed model would be a very useful tool for academic research and drug designing related application areas.

قيم البحث

271 - Sean Mullane , Ruoyan Chen , Sri Vaishnavi Vemulapalli 2019

The biological function of a protein stems from its 3-dimensional structure, which is thermodynamically determined by the energetics of interatomic forces between its amino acid building blocks (the order of amino acids, known as the sequence, define s a protein). Given the costs (time, money, human resources) of determining protein structures via experimental means such as X-ray crystallography, can we better describe and compare protein 3D structures in a robust and efficient manner, so as to gain meaningful biological insights? We begin by considering a relatively simple problem, limiting ourselves to just protein secondary structural elements. Historically, many computational methods have been devised to classify amino acid residues in a protein chain into one of several discrete secondary structures, of which the most well-characterized are the geometrically regular $alpha$-helix and $beta$-sheet; irregular structural patterns, such as turns and loops, are less understood. Here, we present a study of Deep Learning techniques to classify the loop-like end cap structures which delimit $alpha$-helices. Previous work used highly empirical and heuristic methods to manually classify helix capping motifs. Instead, we use structural data directly--including (i) backbone torsion angles computed from 3D structures, (ii) macromolecular feature sets (e.g., physicochemical properties), and (iii) helix cap classification data (from CAPS-DB)--as the ground truth to train a bidirectional long short-term memory (BiLSTM) model to classify helix cap residues. We tried different network architectures and scanned hyperparameters in order to train and assess several models; we also trained a Support Vector Classifier (SVC) to use as a baseline. Ultimately, we achieved 85% class-balanced accuracy with a deep BiLSTM model.

الجزيئات الحيوية التعلم الآلي الأساليب الكمية

A Novel Weighted Combination Method for Feature Selection using Fuzzy Sets

97 - Zixiao Shen , Xin Chen , Jonathan M. Garibaldi 2020

In this paper, we propose a novel weighted combination feature selection method using bootstrap and fuzzy sets. The proposed method mainly consists of three processes, including fuzzy sets generation using bootstrap, weighted combination of fuzzy set s and feature ranking based on defuzzification. We implemented the proposed method by combining four state-of-the-art feature selection methods and evaluated the performance based on three publicly available biomedical datasets using five-fold cross validation. Based on the feature selection results, our proposed method produced comparable (if not better) classification accuracies to the best of the individual feature selection methods for all evaluated datasets. More importantly, we also applied standard deviation and Pearsons correlation to measure the stability of the methods. Remarkably, our combination method achieved significantly higher stability than the four individual methods when variations and size reductions were introduced to the datasets.

التعلم الآلي التعلم الالي

Review of Machine-Learning Methods for RNA Secondary Structure Prediction

113 - Qi Zhao , Zheng Zhao , Xiaoya Fan 2020

Secondary structure plays an important role in determining the function of non-coding RNAs. Hence, identifying RNA secondary structures is of great value to research. Computational prediction is a mainstream approach for predicting RNA secondary stru cture. Unfortunately, even though new methods have been proposed over the past 40 years, the performance of computational prediction methods has stagnated in the last decade. Recently, with the increasing availability of RNA structure data, new methods based on machine-learning technologies, especially deep learning, have alleviated the issue. In this review, we provide a comprehensive overview of RNA secondary structure prediction methods based on machine-learning technologies and a tabularized summary of the most important methods in this field. The current pending issues in the field of RNA secondary structure prediction and future trends are also discussed.

الجزيئات الحيوية التعلم الآلي التعلم الالي

Extended Linear Response for Bioanalytical Applications Using Multiple Enzymes

439 - Vladimir Privman , Oleksandr Zavalov , Aleksandr Simonian 2013

We develop a framework for optimizing a novel approach to extending the linear range of bioanalytical systems and biosensors by utilizing two enzymes with different kinetic responses to the input chemical as their substrate. Data for the flow-injecti on amperometric system devised for detection of lysine based on the function of L-Lysine-alpha-Oxidase and Lysine-2-monooxygenase are analyzed. Lysine is a homotropic substrate for the latter enzyme. We elucidate the mechanism for extending the linear response range and develop optimization techniques for future applications of such systems.

الشبكات الجزيئية مادة مكثفة ناعمة الفيزياء الكيميائية

Machine learning a model for RNA structure prediction

89 - Nicola Calonaci , Alisha Jones , Francesca Cuturello 2020

RNA function crucially depends on its structure. Thermodynamic models currently used for secondary structure prediction rely on computing the partition function of folding ensembles, and can thus estimate minimum free-energy structures and ensemble p opulations. These models sometimes fail in identifying native structures unless complemented by auxiliary experimental data. Here, we build a set of models that combine thermodynamic parameters, chemical probing data (DMS, SHAPE), and co-evolutionary data (Direct Coupling Analysis, DCA) through a network that outputs perturbations to the ensemble free energy. Perturbations are trained to increase the ensemble populations of a representative set of known native RNA structures. In the chemical probing nodes of the network, a convolutional window combines neighboring reactivities, enlightening their structural information content and the contribution of local conformational ensembles. Regularization is used to limit overfitting and improve transferability. The most transferable model is selected through a cross-validation strategy that estimates the performance of models on systems on which they are not trained. With the selected model we obtain increased ensemble populations for native structures and more accurate predictions in an independent validation set. The flexibility of the approach allows the model to be easily retrained and adapted to incorporate arbitrary experimental information.

الجزيئات الحيوية الفيزياء البيولوجية تحليل البيانات والإحصاءات والاحتمال