ترغب بنشر مسار تعليمي؟ اضغط هنا

Mass spectrometry based protein identification with accurate statistical significance assignment

276   0   0.0 ( 0 )
 نشر من قبل Yi-Kuo Yu
 تاريخ النشر 2014
  مجال البحث علم الأحياء
والبحث باللغة English




اسأل ChatGPT حول البحث

Motivation: Assigning statistical significance accurately has become increasingly important as meta data of many types, often assembled in hierarchies, are constructed and combined for further biological analyses. Statistical inaccuracy of meta data at any level may propagate to downstream analyses, undermining the validity of scientific conclusions thus drawn. From the perspective of mass spectrometry based proteomics, even though accurate statistics for peptide identification can now be achieved, accurate protein level statistics remain challenging. Results: We have constructed a protein ID method that combines peptide evidences of a candidate protein based on a rigorous formula derived earlier; in this formula the database $P$-value of every peptide is weighted, prior to the final combination, according to the number of proteins it maps to. We have also shown that this protein ID method provides accurate protein level $E$-value, eliminating the need of using empirical post-processing methods for type-I error control. Using a known protein mixture, we find that this protein ID method, when combined with the Soric formula, yields accurate values for the proportion of false discoveries. In terms of retrieval efficacy, the results from our method are comparable with other methods tested. Availability: The source code, implemented in C++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit


قيم البحث

اقرأ أيضاً

131 - Gelio Alves , Aleksey Ogurtsov , 2008
Summary: In anticipation of the individualized proteomics era and the need to integrate knowledge from disease studies, we have augmented our peptide identification software RAId DbS to take into account annotated single amino acid polymorphisms, pos t-translational modifications, and their documented disease associations while analyzing a tandem mass spectrum. To facilitate new discoveries, RAId DbS allows users to conduct searches permitting novel polymorphisms. Availability: The webserver link is http://www.ncbi.nlm.nih.gov/ /CBBResearch/qmbp/raid dbs/index.html. The relevant databases and binaries of RAId DbS for Linux, Windows, and Mac OS X are available from the same web page. Contact: [email protected]
The common techniques to study protein-protein proximity in vivo are not well-adapted to the capabilities and the expertise of a standard proteomics laboratory, typically based on the use of mass spectrometry. With the aim of closing this gap, we hav e developed PUB-MS (for Proximity Utilizing Biotinylation and Mass Spectrometry), an approach to monitor protein-protein proximity, based on biotinylation of a protein fused to a biotin-acceptor peptide (BAP) by a biotin-ligase, BirA, fused to its interaction partner. The biotinylation status of the BAP can be further detected by either Western analysis or mass spectrometry. The BAP sequence was redesigned for easy monitoring of the biotinylation status by LC-MS/MS. In several experimental models, we demonstrate that the biotinylation in vivo is specifically enhanced when the BAP- and BirA- fused proteins are in proximity to each other. The advantage of mass spectrometry is demonstrated by using BAPs with different sequences in a single experiment (allowing multiplex analysis) and by the use of stable isotopes. Finally, we show that our methodology can be also used to study a specific subfraction of a protein of interest that was in proximity with another protein at a predefined time before the analysis.
Native electrospray ionization/ion mobility-mass spectrometry (ESI/IM-MS) allows an accurate determination of low-resolution structural features of proteins. Yet, the presence of proton dynamics, observed already by us for DNA in the gas phase, and i ts impact on protein structural determinants, have not been investigated so far. Here, we address this issue by a multi-step simulation strategy on a pharmacologically relevant peptide, the N-terminal residues of amyloid-beta peptide (Abeta(1-16)). Our calculations reproduce the experimental maximum charge state from ESI-MS and are also in fair agreement with collision cross section (CCS) data measured here by ESI/IM-MS. Although the main structural features are preserved, subtle conformational changes do take place in the first ~0.1 ms of dynamics. In addition, intramolecular proton dynamics processes occur on the ps-timescale in the gas phase as emerging from quantum mechanics/molecular mechanics (QM/MM) simulations at the B3LYP level of theory. We conclude that proton transfer phenomena do occur frequently during fly time in ESI-MS experiments (typically on the ms timescale). However, the structural changes associated with the process do not significantly affect the structural determinants.
187 - Yisu Peng 2020
Motivation: Accurate estimation of false discovery rate (FDR) of spectral identification is a central problem in mass spectrometry-based proteomics. Over the past two decades, target decoy approaches (TDAs) and decoy-free approaches (DFAs), have been widely used to estimate FDR. TDAs use a database of decoy species to faithfully model score distributions of incorrect peptide-spectrum matches (PSMs). DFAs, on the other hand, fit two-component mixture models to learn the parameters of correct and incorrect PSM score distributions. While conceptually straightforward, both approaches lead to problems in practice, particularly in experiments that push instrumentation to the limit and generate low fragmentation-efficiency and low signal-to-noise-ratio spectra. Results: We introduce a new decoy-free framework for FDR estimation that generalizes present DFAs while exploiting more search data in a manner similar to TDAs. Our approach relies on multi-component mixtures, in which score distributions corresponding to the correct PSMs, best incorrect PSMs, and second-best incorrect PSMs are modeled by the skew normal family. We derive EM algorithms to estimate parameters of these distributions from the scores of best and second-best PSMs associated with each experimental spectrum. We evaluate our models on multiple proteomics datasets and a HeLa cell digest case study consisting of more than a million spectra in total. We provide evidence of improved performance over existing DFAs and improved stability and speed over TDAs without any performance degradation. We propose that the new strategy has the potential to extend beyond peptide identification and reduce the need for TDA on all analytical platforms.
BACKGROUND: One of the most evident achievements of bioinformatics is the development of methods that transfer biological knowledge from characterised proteins to uncharacterised sequences. This mode of protein function assignment is mostly based on the detection of sequence similarity and the premise that functional properties are conserved during evolution. Most automatic approaches developed to date rely on the identification of clusters of homologous proteins and the mapping of new proteins onto these clusters, which are expected to share functional characteristics. RESULTS: Here, we inverse the logic of this process, by considering the mapping of sequences directly to a functional classification instead of mapping functions to a sequence clustering. In this mode, the starting point is a database of labelled proteins according to a functional classification scheme, and the subsequent use of sequence similarity allows defining the membership of new proteins to these functional classes. In this framework, we define the Correspondence Indicators as measures of relationship between sequence and function and further formulate two Bayesian approaches to estimate the probability for a sequence of unknown function to belong to a functional class. This approach allows the parametrisation of different sequence search strategies and provides a direct measure of annotation error rates. We validate this approach with a database of enzymes labelled by their corresponding four-digit EC numbers and analyse specific cases. CONCLUSION: The performance of this method is significantly higher than the simple strategy consisting in transferring the annotation from the highest scoring BLAST match and is expected to find applications in automated functional annotation pipelines.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا