بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Analysis of Compression Techniques for DNA Sequence Data

102 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Adnan Iftekhar

تاريخ النشر 2020

مجال البحث علم الأحياء

والبحث باللغة English

تأليف Shakeela Bibi - Javed Iqbal - Adnan Iftekhar

علم الأحياء الكمي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Biological data mainly comprises of Deoxyribonucleic acid (DNA) and protein sequences. These are the biomolecules which are present in all cells of human beings. Due to the self-replicating property of DNA, it is a key constitute of genetic material that exist in all breathingcreatures. This biomolecule (DNA) comprehends the genetic material obligatory for the operational and expansion of all personified lives. To save DNA data of single person we require 10CD-ROMs.Moreover, this size is increasing constantly, and more and more sequences are adding in the public databases. This abundant increase in the sequence data arise challenges in the precise information extraction from this data. Since many data analyzing and visualization tools do not support processing of this huge amount of data. To reduce the size of DNA and protein sequence, many scientists introduced various types of sequence compression algorithms such as compress or gzip, Context Tree Weighting (CTW), Lampel Ziv Welch (LZW), arithmetic coding, run-length encoding and substitution method etc. These techniques have sufficiently contributed to minimizing the volume of the biological datasets. On the other hand, traditional compression techniques are also not much suitable for the compression of these types of sequential data. In this paper, we have explored diverse types of techniques for compression of large amounts of DNA Sequence Data. In this paper, the analysis of techniques reveals that efficient techniques not only reduce the size of the sequence but also avoid any information loss. The review of existing studies also shows that compression of a DNA sequence is significant for understanding the critical characteristics of DNA data in addition to improving storage efficiency and data transmission. In addition, the compression of the protein sequence is a challenge for the research community. The major parameters for evaluation of these compression algorithms include compression ratio, running time complexity etc.

قيم البحث

اقرأ أيضاً

Poincare recurrences of DNA sequence

332 - K. M. Frahm , D. L. Shepelyansky 2011

We analyze the statistical properties of Poincare recurrences of Homo sapiens, mammalian and other DNA sequences taken from Ensembl Genome data base with up to fifteen billions base pairs. We show that the probability of Poincare recurrences decays i n an algebraic way with the Poincare exponent $beta approx 4$ even if oscillatory dependence is well pronounced. The correlations between recurrences decay with an exponent $ u approx 0.6$ that leads to an anomalous super-diffusive walk. However, for Homo sapiens sequences, with the largest available statistics, the diffusion coefficient converges to a finite value on distances larger than million base pairs. We argue that the approach based on Poncare recurrences determines new proximity features between different species and shed a new light on their evolution history.

الجينوم الميكانيكا الإحصائية الفيزياء البيولوجية

Using PCA and Factor Analysis for Dimensionality Reduction of Bio-informatics Data

144 - M. Usman Ali , Shahzad Ahmed , Javed Ferzund 2017

Large volume of Genomics data is produced on daily basis due to the advancement in sequencing technology. This data is of no value if it is not properly analysed. Different kinds of analytics are required to extract useful information from this raw d ata. Classification, Prediction, Clustering and Pattern Extraction are useful techniques of data mining. These techniques require appropriate selection of attributes of data for getting accurate results. However, Bioinformatics data is high dimensional, usually having hundreds of attributes. Such large a number of attributes affect the performance of machine learning algorithms used for classification/prediction. So, dimensionality reduction techniques are required to reduce the number of attributes that can be further used for analysis. In this paper, Principal Component Analysis and Factor Analysis are used for dimensionality reduction of Bioinformatics data. These techniques were applied on Leukaemia data set and the number of attributes was reduced from to.

علم الأحياء الكمي الهندسة الحاسوبية، المالية،العلوم

WI Fast Stats: a collection of web apps for the visualization and analysis of WI Fast Plants data

89 - Yizhou Liu , Claudia Solis-Lemus 2020

WI Fast Stats is the first and only dedicated tool tailored to the WI Fast Plants educational objectives. WI Fast Stats is an integrated animated web page with a collection of R-developed web apps that provide Data Visualization and Data Analysis too ls for WI Fast Plants data. WI Fast Stats is a user-friendly easy-to-use interface that will render Data Science accessible to K-16 teachers and students currently using WI Fast Plants lesson plans. Users do not need to have strong programming or mathematical background to use WI Fast Stats as the web apps are simple to use, well documented, and freely available.

علم الأحياء الكمي

On the discrete Peyrard-Bishop model of DNA: stationary solutions and stability

130 - Sara Cuenda , Angel Sanchez 2005

As a first step in the search of an analytical study of mechanical denaturation of DNA in terms of the sequence, we study stable, stationary solutions in the discrete, finite and homogeneous Peyrard-Bishop DNA model. We find and classify all the stat ionary solutions of the model, as well as analytic approximations of them, both in the continuum and in the discrete limits. Our results explain the structure of the solutions reported by Theodorakopoulos {em et al.} [Phys. Rev. Lett. {bf 93}, 258101 (2004)] and provide a way to proceed to the analysis of the generalized version of the model incorporating the genetic information.

علم الأحياء الكمي مادة مكثفة ناعمة تكوين نمط والاسلكية

Key-Point Sequence Lossless Compression for Intelligent Video Analysis

129 - Weiyao Lin , Xiaoyi He , Wenrui Dai 2020

Feature coding has been recently considered to facilitate intelligent video analysis for urban computing. Instead of raw videos, extracted features in the front-end are encoded and transmitted to the back-end for further processing. In this article, we present a lossless key-point sequence compression approach for efficient feature coding. The essence of this predict-and-encode strategy is to eliminate the spatial and temporal redundancies of key points in videos. Multiple prediction modes with an adaptive mode selection method are proposed to handle key-point sequences with various structures and motion. Experimental results validate the effectiveness of the proposed scheme on four types of widely used key-point sequences in video analysis.

الوسائط المتعددة

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة سوهاج

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Analysis of Compression Techniques for DNA Sequence Data

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً