No Arabic abstract
We develop a novel peak detection algorithm for the analysis of comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry (GC$times$GC-TOF MS) data using normal-exponential-Bernoulli (NEB) and mixture probability models. The algorithm first performs baseline correction and denoising simultaneously using the NEB model, which also defines peak regions. Peaks are then picked using a mixture of probability distribution to deal with the co-eluting peaks. Peak merging is further carried out based on the mass spectral similarities among the peaks within the same peak group. The algorithm is evaluated using experimental data to study the effect of different cutoffs of the conditional Bayes factors and the effect of different mixture models including Poisson, truncated Gaussian, Gaussian, Gamma and exponentially modified Gaussian (EMG) distributions, and the optimal version is introduced using a trial-and-error approach. We then compare the new algorithm with two existing algorithms in terms of compound identification. Data analysis shows that the developed algorithm can detect the peaks with lower false discovery rates than the existing algorithms, and a less complicated peak picking model is a promising alternative to the more complicated and widely used EMG mixture models.
Numerous studies have been carried out to characterize the chemical composition of laboratory analogues of Titan aerosols (tholins), but their molecular composition as well as their structure are still poorly known. If pyrolysis gas chromatography mass spectrometry (pyr-GCMS) has been used for years to give clues about this composition, the highly disparate results obtained can be attributed to the analytical conditions used and/or to differences in the nature of the analogues studied. In order to have a better description of Titan tholins molecular composition, we led a systematic analysis of these materials using pyr-GCMS with two major objectives: (i) exploring the analytical parameters to estimate the biases this technique can induce and to find an optimum for analyses allowing the detection of a wide range of compounds and thus a characterization of the tholins composition as comprehensive as possible, and (ii) highlighting the role of the CH4 ratio in the gaseous reactive medium on the tholins molecular structure. With this aim, we used a radio-frequency plasma discharge to synthetize tholins with different concentrations of CH4 diluted in N2. The samples were systematically pyrolyzed from 200 to 600{deg}C. The extracted gases were then analyzed by GCMS for their molecular identification.
Motivation: Time course data obtained from biological samples subject to specific treatments can be very useful for revealing complex and novel biological phenomena. Although an increasing number of time course microarray datasets becomes available, most of them contain few biological replicates and time points. So far there are few computational methods that can effectively reveal differentially expressed genes and their patterns in such data. Results: We have proposed a new two-step nonparametric statistical procedure, LRSA, to reveal differentially expressed genes and their expression trends in temporal microarray data. We have also employed external controls as a surrogate to estimate false discovery rates and thus to guide the discovery of differentially expressed genes. Our results showed that LRSA reveals substantially more differentially expressed genes and have much lower than two other methods, STEM and ANOVA, in both real data and the simulated data. Our computational results are confirmed using real-time PCRs. Contact:
[email protected]
Data competitions rely on real-time leaderboards to rank competitor entries and stimulate algorithm improvement. While such competitions have become quite popular and prevalent, particularly in supervised learning formats, their implementations by the host are highly variable. Without careful planning, a supervised learning competition is vulnerable to overfitting, where the winning solutions are so closely tuned to the particular set of provided data that they cannot generalize to the underlying problem of interest to the host. This paper outlines some important considerations for strategically designing relevant and informative data sets to maximize the learning outcome from hosting a competition based on our experience. It also describes a post-competition analysis that enables robust and efficient assessment of the strengths and weaknesses of solutions from different competitors, as well as greater understanding of the regions of the input space that are well-solved. The post-competition analysis, which complements the leaderboard, uses exploratory data analysis and generalized linear models (GLMs). The GLMs not only expand the range of results we can explore, they also provide more detailed analysis of individual sub-questions including similarities and differences between algorithms across different types of scenarios, universally easy or hard regions of the input space, and different learning objectives. When coupled with a strategically planned data generation approach, the methods provide richer and more informative summaries to enhance the interpretation of results beyond just the rankings on the leaderboard. The methods are illustrated with a recently completed competition to evaluate algorithms capable of detecting, identifying, and locating radioactive materials in an urban environment.
The MEG inverse problem refers to the reconstruction of the neural activity of the brain from magnetoencephalography (MEG) measurements. We propose a two-way regularization (TWR) method to solve the MEG inverse problem under the assumptions that only a small number of locations in space are responsible for the measured signals (focality), and each source time course is smooth in time (smoothness). The focality and smoothness of the reconstructed signals are ensured respectively by imposing a sparsity-inducing penalty and a roughness penalty in the data fitting criterion. A two-stage algorithm is developed for fast computation, where a raw estimate of the source time course is obtained in the first stage and then refined in the second stage by the two-way regularization. The proposed method is shown to be effective on both synthetic and real-world examples.
We describe purity measurements of the natural and enriched xenon stockpiles used by the EXO-200 double beta decay experiment based on a mass spectrometry technique. The sensitivity of the spectrometer is enhanced by several orders of magnitude by the presence of a liquid nitrogen cold trap, and many impurity species of interest can be detected at the level of one part-per-billion or better. We have used the technique to screen the EXO-200 xenon before, during, and after its use in our detector, and these measurements have proven useful. This is the first application of the cold trap mass spectrometry technique to an operating physics experiment.