ترغب بنشر مسار تعليمي؟ اضغط هنا

Regression Analysis for Multivariate Dependent Count Data Using Convolved Gaussian Processes

95   0   0.0 ( 0 )
 نشر من قبل Jian Shi
 تاريخ النشر 2017
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Research on Poisson regression analysis for dependent data has been developed rapidly in the last decade. One of difficult problems in a multivariate case is how to construct a cross-correlation structure and at the meantime make sure that the covariance matrix is positive definite. To address the issue, we propose to use convolved Gaussian process (CGP) in this paper. The approach provides a semi-parametric model and offers a natural framework for modeling common mean structure and covariance structure simultaneously. The CGP enables the model to define different covariance structure for each component of the response variables. This flexibility ensures the model to cope with data coming from different resources or having different data structures, and thus to provide accurate estimation and prediction. In addition, the model is able to accommodate large-dimensional covariates. The definition of the model, the inference and the implementation, as well as its asymptotic properties, are discussed. Comprehensive numerical examples with both simulation studies and real data are presented.



قيم البحث

اقرأ أيضاً

In this article, we consider a non-parametric Bayesian approach to multivariate quantile regression. The collection of related conditional distributions of a response vector Y given a univariate covariate X is modeled using a Dependent Dirichlet Proc ess (DDP) prior. The DDP is used to introduce dependence across x. As the realizations from a Dirichlet process prior are almost surely discrete, we need to convolve it with a kernel. To model the error distribution as flexibly as possible, we use a countable mixture of multidimensional normal distributions as our kernel. For posterior computations, we use a truncated stick-breaking representation of the DDP. This approximation enables us to deal with only a finitely number of parameters. We use a Block Gibbs sampler for estimating the model parameters. We illustrate our method with simulation studies and real data applications. Finally, we provide a theoretical justification for the proposed method through posterior consistency. Our proposed procedure is new even when the response is univariate.
Exploiting the theory of state space models, we derive the exact expressions of the information transfer, as well as redundant and synergistic transfer, for coupled Gaussian processes observed at multiple temporal scales. All of the terms, constituti ng the frameworks known as interaction information decomposition and partial information decomposition, can thus be analytically obtained for different time scales from the parameters of the VAR model that fits the processes. We report the application of the proposed methodology firstly to benchmark Gaussian systems, showing that this class of systems may generate patterns of information decomposition characterized by mainly redundant or synergistic information transfer persisting across multiple time scales or even by the alternating prevalence of redundant and synergistic source interaction depending on the time scale. Then, we apply our method to an important topic in neuroscience, i.e., the detection of causal interactions in human epilepsy networks, for which we show the relevance of partial information decomposition to the detection of multiscale information transfer spreading from the seizure onset zone.
Functional data are defined as realizations of random functions (mostly smooth functions) varying over a continuum, which are usually collected with measurement errors on discretized grids. In order to accurately smooth noisy functional observations and deal with the issue of high-dimensional observation grids, we propose a novel Bayesian method based on the Bayesian hierarchical model with a Gaussian-Wishart process prior and basis function representations. We first derive an induced model for the basis-function coefficients of the functional data, and then use this model to conduct posterior inference through Markov chain Monte Carlo. Compared to the standard Bayesian inference that suffers serious computational burden and unstableness for analyzing high-dimensional functional data, our method greatly improves the computational scalability and stability, while inheriting the advantage of simultaneously smoothing raw observations and estimating the mean-covariance functions in a nonparametric way. In addition, our method can naturally handle functional data observed on random or uncommon grids. Simulation and real studies demonstrate that our method produces similar results as the standard Bayesian inference with low-dimensional common grids, while efficiently smoothing and estimating functional data with random and high-dimensional observation grids where the standard Bayesian inference fails. In conclusion, our method can efficiently smooth and estimate high-dimensional functional data, providing one way to resolve the curse of dimensionality for Bayesian functional data analysis with Gaussian-Wishart processes.
Canonical correlation analysis (CCA) is a classical and important multivariate technique for exploring the relationship between two sets of continuous variables. CCA has applications in many fields, such as genomics and neuroimaging. It can extract m eaningful features as well as use these features for subsequent analysis. Although some sparse CCA methods have been developed to deal with high-dimensional problems, they are designed specifically for continuous data and do not consider the integer-valued data from next-generation sequencing platforms that exhibit very low counts for some important features. We propose a model-based probabilistic approach for correlation and canonical correlation estimation for two sparse count data sets (PSCCA). PSCCA demonstrates that correlations and canonical correlations estimated at the natural parameter level are more appropriate than traditional estimation methods applied to the raw data. We demonstrate through simulation studies that PSCCA outperforms other standard correlation approaches and sparse CCA approaches in estimating the true correlations and canonical correlations at the natural parameter level. We further apply the PSCCA method to study the association of miRNA and mRNA expression data sets from a squamous cell lung cancer study, finding that PSCCA can uncover a large number of strongly correlated pairs than standard correlation and other sparse CCA approaches.
Count data are collected in many scientific and engineering tasks including image processing, single-cell RNA sequencing and ecological studies. Such data sets often contain missing values, for example because some ecological sites cannot be reached in a certain year. In addition, in many instances, side information is also available, for example covariates about ecological sites or species. Low-rank methods are popular to denoise and impute count data, and benefit from a substantial theoretical background. Extensions accounting for covariates have been proposed, but to the best of our knowledge their theoretical and empirical properties have not been thoroughly studied, and few softwares are available for practitioners. We propose a complete methodology called LORI (Low-Rank Interaction), including a Poisson model, an algorithm, and automatic selection of the regularization parameter, to analyze count tables with covariates. We also derive an upper bound on the estimation error. We provide a simulation study with synthetic data, revealing empirically that LORI improves on state of the art methods in terms of estimation and imputation of the missing values. We illustrate how the method can be interpreted through visual displays with the analysis of a well-know plant abundance data set, and show that the LORI outputs are consistent with known results. Finally we demonstrate the relevance of the methodology by analyzing a water-birds abundance table from the French national agency for wildlife and hunting management (ONCFS). The method is available in the R package lori on the Comprehensive Archive Network (CRAN).
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا