ترغب بنشر مسار تعليمي؟ اضغط هنا

Gains in Power from Structured Two-Sample Tests of Means on Graphs

159   0   0.0 ( 0 )
 نشر من قبل Laurent Jacob
 تاريخ النشر 2010
والبحث باللغة English




اسأل ChatGPT حول البحث

We consider multivariate two-sample tests of means, where the location shift between the two populations is expected to be related to a known graph structure. An important application of such tests is the detection of differentially expressed genes between two patient populations, as shifts in expression levels are expected to be coherent with the structure of graphs reflecting gene properties such as biological process, molecular function, regulation, or metabolism. For a fixed graph of interest, we demonstrate that accounting for graph structure can yield more powerful tests under the assumption of smooth distribution shift on the graph. We also investigate the identification of non-homogeneous subgraphs of a given large graph, which poses both computational and multiple testing problems. The relevance and benefits of the proposed approach are illustrated on synthetic data and on breast cancer gene expression data analyzed in context of KEGG pathways.



قيم البحث

اقرأ أيضاً

Observed gonorrhea case rates (number of positive tests per 100,000 individuals) increased by 75 percent in the United States between 2009 and 2017, predominantly among men. However, testing recommendations by the Centers for Disease Control and Prev ention (CDC) have also changed over this period with more frequent screening for sexually transmitted infections (STIs) recommended among men who have sex with men (MSM) who are sexually active. In this and similar disease surveillance settings, a common question is whether observed increases in the overall proportion of positive tests over time is due only to increased testing of diseased individuals, increased underlying disease or both. By placing this problem within a counterfactual framework, we can carefully consider untestable assumptions under which this question may be answered and, in turn, a principled approach to statistical analysis. This report outlines this thought process.
Two-sample and independence tests with the kernel-based MMD and HSIC have shown remarkable results on i.i.d. data and stationary random processes. However, these statistics are not directly applicable to non-stationary random processes, a prevalent f orm of data in many scientific disciplines. In this work, we extend the application of MMD and HSIC to non-stationary settings by assuming access to independent realisations of the underlying random process. These realisations - in the form of non-stationary time-series measured on the same temporal grid - can then be viewed as i.i.d. samples from a multivariate probability distribution, to which MMD and HSIC can be applied. We further show how to choose suitable kernels over these high-dimensional spaces by maximising the estimated test power with respect to the kernel hyper-parameters. In experiments on synthetic data, we demonstrate superior performance of our proposed approaches in terms of test power when compared to current state-of-the-art functional or multivariate two-sample and independence tests. Finally, we employ our methods on a real socio-economic dataset as an example application.
We consider testing for two-sample means of high dimensional populations by thresholding. Two tests are investigated, which are designed for better power performance when the two population mean vectors differ only in sparsely populated coordinates. The first test is constructed by carrying out thresholding to remove the non-signal bearing dimensions. The second test combines data transformation via the precision matrix with the thresholding. The benefits of the thresholding and the data transformations are showed by a reduced variance of the test thresholding statistics, the improved power and a wider detection region of the tests. Simulation experiments and an empirical study are performed to confirm the theoretical findings and to demonstrate the practical implementations.
State-space models are important tools for quality control of error-prone animal movement data. The near real-time (within 24 h) capability of the Argos satellite system aids dynamic ocean management of human activities by informing when animals ente r intensive use zones. This capability also facilitates use of ocean observations from animal-borne sensors in operational ocean forecasting models. Such near real-time data provision requires rapid, reliable quality control to deal with error-prone Argos locations. We formulate a continuous-time state-space model for the three types of Argos location data (Least-Squares, Kalman filter, and Kalman smoother), accounting for irregular timing of observations. Our model is deliberately simple to ensure speed and reliability for automated, near real-time quality control of Argos data. We validate the model by fitting to Argos data collected from 61 individuals across 7 marine vertebrates and compare model-estimated locations to GPS locations. Estimation accuracy varied among species with median Root Mean Squared Errors usually < 5 km and decreased with increasing data sampling rate and precision of Argos locations. Including a model parameter to inflate Argos error ellipse sizes resulted in more accurate location estimates. In some cases, the model appreciably improved the accuracy of the Argos Kalman smoother locations, which should not be possible if the smoother uses all available information. Our model provides quality-controlled locations from Argos Least-Squares or Kalman filter data with slightly better accuracy than Argos Kalman smoother data that are only available via reprocessing. Simplicity and ease of use make the model suitable both for automated quality control of near real-time Argos data and for manual use by researchers working with historical Argos data.
Alzheimers disease (AD) and Parkinsons disease (PD) are the two most common neurodegenerative disorders in humans. Because a significant percentage of patients have clinical and pathological features of both diseases, it has been hypothesized that th e patho-cascades of the two diseases overlap. Despite this evidence, these two diseases are rarely studied in a joint manner. In this paper, we utilize clinical, imaging, genetic, and biospecimen features to cluster AD and PD patients into the same feature space. By training a machine learning classifier on the combined feature space, we predict the disease stage of patients two years after their baseline visits. We observed a considerable improvement in the prediction accuracy of Parkinsons dementia patients due to combined training on Alzheimers and Parkinsons patients, thereby affirming the claim that these two diseases can be jointly studied.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا