ترغب بنشر مسار تعليمي؟ اضغط هنا

Homogeneity Test for Functional Data based on Data-Depth Plots

160   0   0.0 ( 0 )
 نشر من قبل Alejandro Calle-Saldarriaga
 تاريخ النشر 2020
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

One of the classic concerns in statistics is determining if two samples come from thesame population, i.e. homogeneity testing. In this paper, we propose a homogeneitytest in the context of Functional Data Analysis, adopting an idea from multivariatedata analysis: the data depth plot (DD-plot). This DD-plot is a generalization of theunivariate Q-Q plot (quantile-quantile plot). We propose some statistics based onthese DD-plots, and we use bootstrapping techniques to estimate their distributions.We estimate the finite-sample size and power of our test via simulation, obtainingbetter results than other homogeneity test proposed in the literature. Finally, weillustrate the procedure in samples of real heterogeneous data and get consistent results.



قيم البحث

اقرأ أيضاً

Positron Emission Tomography (PET) is an imaging technique which can be used to investigate chemical changes in human biological processes such as cancer development or neurochemical reactions. Most dynamic PET scans are currently analyzed based on t he assumption that linear first order kinetics can be used to adequately describe the system under observation. However, there has recently been strong evidence that this is not the case. In order to provide an analysis of PET data which is free from this compartmental assumption, we propose a nonparametric deconvolution and analysis model for dynamic PET data based on functional principal component analysis. This yields flexibility in the possible deconvolved functions while still performing well when a linear compartmental model setup is the true data generating mechanism. As the deconvolution needs to be performed on only a relative small number of basis functions rather than voxel by voxel in the entire 3-D volume, the methodology is both robust to typical brain imaging noise levels while also being computationally efficient. The new methodology is investigated through simulations in both 1-D functions and 2-D images and also applied to a neuroimaging study whose goal is the quantification of opioid receptor concentration in the brain.
Competing risks data are common in medical studies, and the sub-distribution hazard (SDH) ratio is considered an appropriate measure. However, because the limitations of hazard itself are not easy to interpret clinically and because the SDH ratio is valid only under the proportional SDH assumption, this article introduced an alternative index under competing risks, named restricted mean time lost (RMTL). Several test procedures were also constructed based on RMTL. First, we introduced the definition and estimation of RMTL based on Aalen-Johansen cumulative incidence functions. Then, we considered several combined tests based on the SDH and the RMTL difference (RMTLd). The statistical properties of the methods are evaluated using simulations and are applied to two examples. The type I errors of combined tests are close to the nominal level. All combined tests show acceptable power in all situations. In conclusion, RMTL can meaningfully summarize treatment effects for clinical decision making, and three combined tests have robust power under various conditions, which can be considered for statistical inference in real data analysis.
Evolutionary models of languages are usually considered to take the form of trees. With the development of so-called tree constraints the plausibility of the tree model assumptions can be addressed by checking whether the moments of observed variable s lie within regions consistent with trees. In our linguistic application, the data set comprises acoustic samples (audio recordings) from speakers of five Romance languages or dialects. We wish to assess these functional data for compatibility with a hereditary tree model at the language level. A novel combination of canonical function analysis (CFA) with a separable covariance structure provides a method for generating a representative basis for the data. This resulting basis is formed of components which emphasize language differences whilst maintaining the integrity of the observational language-groupings. A previously unexploited Gaussian tree constraint is then applied to component-by-component projections of the data to investigate adherence to an evolutionary tree. The results indicate that while a tree model is unlikely to be suitable for modeling all aspects of the acoustic linguistic data, certain features of the spoken Romance languages highlighted by the separable-CFA basis may indeed be suitably modeled as a tree.
The problem of estimating missing fragments of curves from a functional sample has been widely considered in the literature. However, a majority of the reconstruction methods rely on estimating the covariance matrix or the components of its eigendeco mposition, a task that may be difficult. In particular, the accuracy of the estimation might be affected by the complexity of the covariance function and the poor availability of complete functional data. We introduce a non-parametric alternative based on a novel concept of depth for partially observed functional data. Our simulations point out that the available methods are unbeatable when the covariance function is stationary, and there is a large proportion of complete data. However, our approach was superior when considering non-stationary covariance functions or when the proportion of complete functions is scarce. Moreover, even in the most severe case of having all the functions incomplete, our method performs well meanwhile the competitors are unable. The methodology is illustrated with two real data sets: the Spanish daily temperatures observed in different weather stations and the age-specific mortality by prefectures in Japan.
Quantitatively predicting phenotype variables by the expression changes in a set of candidate genes is of great interest in molecular biology but it is also a challenging task for several reasons. First, the collected biological observations might be heterogeneous and correspond to different biological mechanisms. Secondly, the gene expression variables used to predict the phenotype are potentially highly correlated since genes interact though unknown regulatory networks. In this paper, we present a novel approach designed to predict quantitative trait from transcriptomic data, taking into account the heterogeneity in biological samples and the hidden gene regulatory networks underlying different biological mechanisms. The proposed model performs well on prediction but it is also fully parametric, which facilitates the downstream biological interpretation. The model provides clusters of individuals based on the relation between gene expression data and the phenotype, and also leads to infer a gene regulatory network specific for each cluster of individuals. We perform numerical simulations to demonstrate that our model is competitive with other prediction models, and we demonstrate the predictive performance and the interpretability of our model to predict alcohol sensitivity from transcriptomic data on real data from Drosophila Melanogaster Genetic Reference Panel (DGRP).
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا