Do you want to publish a course? Click here

Probabilistic Approaches to Alignment with Tandem Repeats

185   0   0.0 ( 0 )
 Added by Aaron Darling
 Publication date 2013
  fields Biology
and research's language is English




Ask ChatGPT about the research

We propose a simple tractable pair hidden Markov model for pairwise sequence alignment that accounts for the presence of short tandem repeats. Using the framework of gain functions, we design several optimization criteria for decoding this model and describe the resulting decoding algorithms, ranging from the traditional Viterbi and posterior decoding to block-based decoding algorithms specialized for our model. We compare the accuracy of individual decoding algorithms on simulated data and find our approach superior to the classical three-state pair HMM in simulations.



rate research

Read More

We introduce the software tool NTRFinder to find the complex repetitive structure in DNA we call a nested tandem repeat (NTR). An NTR is a recurrence of two or more distinct tandem motifs interspersed with each other. We propose that nested tandem repeats can be used as phylogenetic and population markers. We have tested our algorithm on both real and simulated data, and present some real nested tandem repeats of interest. We discuss how the NTR found in the ribosomal DNA of taro (Colocasia esculenta) may assist in determining the cultivation prehistory of this ancient staple food crop. NTRFinder can be downloaded from http://www.maths.otago.ac.nz/? aamatroud/.
Recent technological advances in Next Generation Sequencing tools have led to increasing speeds of DNA sample collection, preparation, and sequencing. One instrument can produce over 600 Gb of genetic sequence data in a single run. This creates new opportunities to efficiently handle the increasing workload. We propose a new method of fast genetic sequence analysis using the Dynamic Distributed Dimensional Data Model (D4M) - an associative array environment for MATLAB developed at MIT Lincoln Laboratory. Based on mathematical and statistical properties, the method leverages big data techniques and the implementation of an Apache Acculumo database to accelerate computations one-hundred fold over other methods. Comparisons of the D4M method with the current gold-standard for sequence analysis, BLAST, show the two are comparable in the alignments they find. This paper will present an overview of the D4M genetic sequence algorithm and statistical comparisons with BLAST.
96 - Etay Ziv 2004
Exploiting recent developments in information theory, we propose, illustrate, and validate a principled information-theoretic algorithm for module discovery and resulting measure of network modularity. This measure is an order parameter (a dimensionless number between 0 and 1). Comparison is made to other approaches to module-discovery and to quantifying network modularity using Monte Carlo generated Erdos-like modular networks. Finally, the Network Information Bottleneck (NIB) algorithm is applied to a number of real world networks, including the social network of coauthors at the APS March Meeting 2004.
PURPOSE: The popularity of germline genetic panel testing has led to a vast accumulation of variant-level data. Variant names are not always consistent across laboratories and not easily mappable to public variant databases such as ClinVar. A tool that can automate the process of variants harmonization and mapping is needed to help clinicians ensure their variant interpretations are accurate. METHODS: We present a Python-based tool, Ask2Me VarHarmonizer, that incorporates data cleaning, name harmonization, and a four-attempt mapping to ClinVar procedure. We applied this tool to map variants from a pilot dataset collected from 11 clinical practices. Mapping results were evaluated with and without the transcript information. RESULTS: Using Ask2Me VarHarmonizer, 4728 out of 6027 variant entries (78%) were successfully mapped to ClinVar, corresponding to 3699 mappable unique variants. With the addition of 1099 unique unmappable variants, a total of 4798 unique variants were eventually identified. 427 (9%) of these had multiple names, of which 343 (7%) had multiple names within-practice. 99% mapping consistency was observed with and without transcript information. CONCLUSION: Ask2Me VarHarmonizer aggregates and structures variant data, harmonizes names, and maps variants to ClinVar. Performing harmonization removes the ambiguity and redundancy of variants from different sources.
Untargeted metabolomic studies are revealing large numbers of naturally occurring metabolites that cannot be characterized because their chemical structures and MS/MS spectra are not available in databases. Here we present iMet, a computational tool based on experimental tandem mass spectrometry that could potentially allow the annotation of metabolites not discovered previously. iMet uses MS/MS spectra to identify metabolites structurally similar to an unknown metabolite, and gives a net atomic addition or removal that converts the known metabolite into the unknown one. We validate the algorithm with 148 metabolites, and show that for 89% of them at least one of the top four matches identified by iMet enables the proper annotation of the unknown metabolite. iMet is freely available at http://imet.seeslab.net.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا