ترغب بنشر مسار تعليمي؟ اضغط هنا

Spatially Clustered Regression

61   0   0.0 ( 0 )
 نشر من قبل Shonosuke Sugasawa
 تاريخ النشر 2020
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Spatial regression or geographically weighted regression models have been widely adopted to capture the effects of auxiliary information on a response variable of interest over a region. In contrast, relationships between response and auxiliary variables are expected to exhibit complex spatial patterns in many applications. This paper proposes a new approach for spatial regression, called spatially clustered regression, to estimate possibly clustered spatial patterns of the relationships. We combine K-means-based clustering formulation and penalty function motivated from a spatial process known as Potts model for encouraging similar clustering in neighboring locations. We provide a simple iterative algorithm to fit the proposed method, scalable for large spatial datasets. Through simulation studies, the proposed method demonstrates its superior performance to existing methods even under the true structure does not admit spatial clustering. Finally, the proposed method is applied to crime event data in Tokyo and produces interpretable results for spatial patterns. The R code is available at https://github.com/sshonosuke/SCR.



قيم البحث

اقرأ أيضاً

This paper develops an incremental learning algorithm based on quadratic inference function (QIF) to analyze streaming datasets with correlated outcomes such as longitudinal data and clustered data. We propose a renewable QIF (RenewQIF) method within a paradigm of renewable estimation and incremental inference, in which parameter estimates are recursively renewed with current data and summary statistics of historical data, but with no use of any historical subject-level raw data. We compare our renewable estimation method with both offline QIF and offline generalized estimating equations (GEE) approach that process the entire cumulative subject-level data, and show theoretically and numerically that our renewable procedure enjoys statistical and computational efficiency. We also propose an approach to diagnose the homogeneity assumption of regression coefficients via a sequential goodness-of-fit test as a screening procedure on occurrences of abnormal data batches. We implement the proposed methodology by expanding existing Sparks Lambda architecture for the operation of statistical inference and data quality diagnosis. We illustrate the proposed methodology by extensive simulation studies and an analysis of streaming car crash datasets from the National Automotive Sampling System-Crashworthiness Data System (NASS CDS).
A population-averaged additive subdistribution hazard model is proposed to assess the marginal effects of covariates on the cumulative incidence function to analyze correlated failure time data subject to competing risks. This approach extends the po pulation-averaged additive hazard model by accommodating potentially dependent censoring due to competing events other than the event of interest. Assuming an independent working correlation structure, an estimating equations approach is considered to estimate the regression coefficients and a sandwich variance estimator is proposed. The sandwich variance estimator accounts for both the correlations between failure times as well as the those between the censoring times, and is robust to misspecification of the unknown dependency structure within each cluster. We further develop goodness-of-fit tests to assess the adequacy of the additive structure of the subdistribution hazard for each covariate, as well as for the overall model. Simulation studies are carried out to investigate the performance of the proposed methods in finite samples; and we illustrate our methods by analyzing the STrategies to Reduce Injuries and Develop confidence in Elders (STRIDE) study.
We propose a change-point detection method for large scale multiple testing problems with data having clustered signals. Unlike the classic change-point setup, the signals can vary in size within a cluster. The clustering structure on the signals ena bles us to effectively delineate the boundaries between signal and non-signal segments. New test statistics are proposed for observations from one and/or multiple realizations. Their asymptotic distributions are derived. We also study the associated variance estimation problem. We allow the variances to be heteroscedastic in the multiple realization case, which substantially expands the applicability of the proposed method. Simulation studies demonstrate that the proposed approach has a favorable performance. Our procedure is applied to {an array based Comparative Genomic Hybridization (aCGH)} dataset.
This article concerns a class of generalized linear mixed models for clustered data, where the random effects are mapped uniquely onto the grouping structure and are independent between groups. We derive necessary and sufficient conditions that enabl e the marginal likelihood of such class of models to be expressed in closed-form. Illustrations are provided using the Gaussian, Poisson, binomial and gamma distributions. These models are unified under a single umbrella of conjugate generalized linear mixed models, where conjugate refers to the fact that the marginal likelihood can be expressed in closed-form, rather than implying inference via the Bayesian paradigm. Having an explicit marginal likelihood means that these models are more computationally convenient, which can be important in big data contexts. Except for the binomial distribution, these models are able to achieve simultaneous conjugacy, and thus able to accommodate both unit and group level covariates.
Radiomics involves the study of tumor images to identify quantitative markers explaining cancer heterogeneity. The predominant approach is to extract hundreds to thousands of image features, including histogram features comprised of summaries of the marginal distribution of pixel intensities, which leads to multiple testing problems and can miss out on insights not contained in the selected features. In this paper, we present methods to model the entire marginal distribution of pixel intensities via the quantile function as functional data, regressed on a set of demographic, clinical, and genetic predictors. We call this approach quantile functional regression, regressing subject-specific marginal distributions across repeated measurements on a set of covariates, allowing us to assess which covariates are associated with the distribution in a global sense, as well as to identify distributional features characterizing these differences, including mean, variance, skewness, and various upper and lower quantiles. To account for smoothness in the quantile functions, we introduce custom basis functions we call quantlets that are sparse, regularized, near-lossless, and empirically defined, adapting to the features of a given data set. We fit this model using a Bayesian framework that uses nonlinear shrinkage of quantlet coefficients to regularize the functional regression coefficients and provides fully Bayesian inference after fitting a Markov chain Monte Carlo. We demonstrate the benefit of the basis space modeling through simulation studies, and apply the method to Magnetic resonance imaging (MRI) based radiomic dataset from Glioblastoma Multiforme to relate imaging-based quantile functions to demographic, clinical, and genetic predictors, finding specific differences in tumor pixel intensity distribution between males and females and between tumors with and without DDIT3 mutations.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا