Subscribe to the gold package and get unlimited access to Shamra Academy

Using PCA and Factor Analysis for Dimensionality Reduction of Bio-informatics Data

145 0 0.0 ( 0 )

Download Cite

Added by Shahzad Ahmed Mr.

Publication date 2017

fields Biology Informatics Engineering

and research's language is English

Authors M. Usman Ali - Shahzad Ahmed - Javed Ferzund

Other Quantitative Biology Computational Engineering

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Large volume of Genomics data is produced on daily basis due to the advancement in sequencing technology. This data is of no value if it is not properly analysed. Different kinds of analytics are required to extract useful information from this raw data. Classification, Prediction, Clustering and Pattern Extraction are useful techniques of data mining. These techniques require appropriate selection of attributes of data for getting accurate results. However, Bioinformatics data is high dimensional, usually having hundreds of attributes. Such large a number of attributes affect the performance of machine learning algorithms used for classification/prediction. So, dimensionality reduction techniques are required to reduce the number of attributes that can be further used for analysis. In this paper, Principal Component Analysis and Factor Analysis are used for dimensionality reduction of Bioinformatics data. These techniques were applied on Leukaemia data set and the number of attributes was reduced from to.

rate research

Analysis of Compression Techniques for DNA Sequence Data

101 - Shakeela Bibi , Javed Iqbal , Adnan Iftekhar 2020

Biological data mainly comprises of Deoxyribonucleic acid (DNA) and protein sequences. These are the biomolecules which are present in all cells of human beings. Due to the self-replicating property of DNA, it is a key constitute of genetic material that exist in all breathingcreatures. This biomolecule (DNA) comprehends the genetic material obligatory for the operational and expansion of all personified lives. To save DNA data of single person we require 10CD-ROMs.Moreover, this size is increasing constantly, and more and more sequences are adding in the public databases. This abundant increase in the sequence data arise challenges in the precise information extraction from this data. Since many data analyzing and visualization tools do not support processing of this huge amount of data. To reduce the size of DNA and protein sequence, many scientists introduced various types of sequence compression algorithms such as compress or gzip, Context Tree Weighting (CTW), Lampel Ziv Welch (LZW), arithmetic coding, run-length encoding and substitution method etc. These techniques have sufficiently contributed to minimizing the volume of the biological datasets. On the other hand, traditional compression techniques are also not much suitable for the compression of these types of sequential data. In this paper, we have explored diverse types of techniques for compression of large amounts of DNA Sequence Data. In this paper, the analysis of techniques reveals that efficient techniques not only reduce the size of the sequence but also avoid any information loss. The review of existing studies also shows that compression of a DNA sequence is significant for understanding the critical characteristics of DNA data in addition to improving storage efficiency and data transmission. In addition, the compression of the protein sequence is a challenge for the research community. The major parameters for evaluation of these compression algorithms include compression ratio, running time complexity etc.

Other Quantitative Biology

Interactions of Fungi with Concrete: Significant Importance for Bio-Based Self-Healing Concrete

109 - Jing Luo , Xiaobo Chen , Jada Crump 2017

The goal of this study is to explore a new self-healing concept in which fungi are used as a self-healing agent to promote calcium mineral precipitation to fill the cracks in concrete. An initial screening of different species of fungi has been conducted. Fungal growth medium was overlaid onto cured concrete plate. Mycelial discs were aseptically deposited at the plate center. The results showed that, due to the dissolving of Ca(OH)2 from concrete, the pH of the growth medium increased from its original value of 6.5 to 13.0. Despite the drastic pH increase, Trichoderma reesei (ATCC13631) spores germinated into hyphal mycelium and grew equally well with or without concrete. X-ray diffraction (XRD) and scanning electron microscope (SEM) confirmed that the crystals precipitated on the fungal hyphae were composed of calcite. These results indicate that T. reesei has great potential to be used in bio-based self-healing concrete for sustainable infrastructure.

Other Quantitative Biology Applied Physics

ivis Dimensionality Reduction Framework for Biomacromolecular Simulations

54 - Hao Tian , Peng Tao 2020

Molecular dynamics (MD) simulations have been widely applied to study macromolecules including proteins. However, high-dimensionality of the datasets produced by simulations makes it difficult for thorough analysis, and further hinders a deeper understanding of biomacromolecules. To gain more insights into the protein structure-function relations, appropriate dimensionality reduction methods are needed to project simulations onto low-dimensional spaces. Linear dimensionality reduction methods, such as principal component analysis (PCA) and time-structure based independent component analysis (t-ICA), could not preserve sufficient structural information. Though better than linear methods, nonlinear methods, such as t-distributed stochastic neighbor embedding (t-SNE), still suffer from the limitations in avoiding system noise and keeping inter-cluster relations. ivis is a novel deep learning-based dimensionality reduction method originally developed for single-cell datasets. Here we applied this framework for the study of light, oxygen and voltage (LOV) domain of diatom Phaeodactylum tricornutum aureochrome 1a (PtAu1a). Compared with other methods, ivis is shown to be superior in constructing Markov state model (MSM), preserving information of both local and global distances and maintaining similarity between high dimension and low dimension with the least information loss. Moreover, ivis framework is capable of providing new prospective for deciphering residue-level protein allostery through the feature weights in the neural network. Overall, ivis is a promising member in the analysis toolbox for proteins.

Quantitative Methods

Open Source Software Sustainability Models: Initial White Paper from the Informatics Technology for Cancer Research Sustainability and Industry Partnership Work Group

104 - Y. Ye , R. D. Boyce , M.K. Davis 2019

The Sustainability and Industry Partnership Work Group (SIP-WG) is a part of the National Cancer Institute Informatics Technology for Cancer Research (ITCR) program. The charter of the SIP-WG is to investigate options of long-term sustainability of open source software (OSS) developed by the ITCR, in part by developing a collection of business model archetypes that can serve as sustainability plans for ITCR OSS development initiatives. The workgroup assembled models from the ITCR program, from other studies, and via engagement of its extensive network of relationships with other organizations (e.g., Chan Zuckerberg Initiative, Open Source Initiative and Software Sustainability Institute). This article reviews existing sustainability models and describes ten OSS use cases disseminated by the SIP-WG and others, and highlights five essential attributes (alignment with unmet scientific needs, dedicated development team, vibrant user community, feasible licensing model, and sustainable financial model) to assist academic software developers in achieving best practice in software sustainability.

Other Quantitative Biology Software Engineering

WI Fast Stats: a collection of web apps for the visualization and analysis of WI Fast Plants data

89 - Yizhou Liu , Claudia Solis-Lemus 2020

WI Fast Stats is the first and only dedicated tool tailored to the WI Fast Plants educational objectives. WI Fast Stats is an integrated animated web page with a collection of R-developed web apps that provide Data Visualization and Data Analysis tools for WI Fast Plants data. WI Fast Stats is a user-friendly easy-to-use interface that will render Data Science accessible to K-16 teachers and students currently using WI Fast Plants lesson plans. Users do not need to have strong programming or mathematical background to use WI Fast Stats as the web apps are simple to use, well documented, and freely available.

Other Quantitative Biology

comments

Fetching comments

University of Aleppo

Additional details More universities

Using PCA and Factor Analysis for Dimensionality Reduction of Bio-informatics Data

Ask ChatGPT about the research

No Arabic abstract

Read More