أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Chi Nguyen

SPATA: A Seeding and Patching Algorithm for Hybrid Transcriptome Assembly

81 - Tin Chi Nguyen , Zhiyu Zhao , Dongxiao Zhu 2013

Transcriptome assembly from RNA-Seq reads is an active area of bioinformatics research. The ever-declining cost and the increasing depth of RNA-Seq have provided unprecedented opportunities to better identify expressed transcripts. However, the nonli near transcript structures and the ultra-high throughput of RNA-Seq reads pose significant algorithmic and computational challenges to the existing transcriptome assembly approaches, either reference-guided or de novo. While reference-guided approaches offer good sensitivity, they rely on alignment results of the splice-aware aligners and are thus unsuitable for species with incomplete reference genomes. In contrast, de novo approaches do not depend on the reference genome but face a computational daunting task derived from the complexity of the graph built for the whole transcriptome. In response to these challenges, we present a hybrid approach to exploit an incomplete reference genome without relying on splice-aware aligners. We have designed a split-and-align procedure to efficiently localize the reads to individual genomic loci, which is followed by an accurate de novo assembly to assemble reads falling into each locus. Using extensive simulation data, we demonstrate a high accuracy and precision in transcriptome reconstruction by comparing to selected transcriptome assembly tools. Our method is implemented in assemblySAM, a GUI software freely available at http://sammate.sourceforge.net.

الهندسة الحاسوبية، المالية،العلوم الجينوم

SASeq: A Selective and Adaptive Shrinkage Approach to Detect and Quantify Active Transcripts using RNA-Seq

102 - Tin Chi Nguyen , Nan Deng , Dongxiao Zhu 2012

Identification and quantification of condition-specific transcripts using RNA-Seq is vital in transcriptomics research. While initial efforts using mathematical or statistical modeling of read counts or per-base exonic signal have been successful, th ey may suffer from model overfitting since not all the reference transcripts in a database are expressed under a specific biological condition. Standard shrinkage approaches, such as Lasso, shrink all the transcript abundances to zero in a non-discriminative manner. Thus it does not necessarily yield the set of condition-specific transcripts. Informed shrinkage approaches, using the observed exonic coverage signal, are thus desirable. Motivated by ubiquitous uncovered exonic regions in RNA-Seq data, termed as naked exons, we propose a new computational approach that first filters out the reference transcripts not supported by splicing and paired-end reads, then followed by fitting a new mathematical model of per-base exonic coverage signal and the underlying transcripts structure. We introduce a tuning parameter to penalize the specific regions of the selected transcripts that were not supported by the naked exons. Our approach compares favorably with the selected competing methods in terms of both time complexity and accuracy using simulated and real-world data. Our method is implemented in SAMMate, a GUI software suite freely available from http://sammate.sourceforge.net

الأساليب الكمية الهندسة الحاسوبية، المالية،العلوم الجينوم

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد