ﻻ يوجد ملخص باللغة العربية
Although a number of studies have developed fast geographically weighted regression (GWR) algorithms for large samples, none of them has achieved linear-time estimation, which is considered a requisite for big data analysis in machine learning, geostatistics, and related domains. Against this backdrop, this study proposes a scalable GWR (ScaGWR) for large datasets. The key improvement is the calibration of the model through a pre-compression of the matrices and vectors whose size depends on the sample size, prior to the leave-one-out cross-validation, which is the heaviest computational step in conventional GWR. This pre-compression allows us to run the proposed GWR extension so that its computation time increases linearly with the sample size. With this improvement, the ScaGWR can be calibrated with one million observations without parallelization. Moreover, the ScaGWR estimator can be regarded as an empirical Bayesian estimator that is more stable than the conventional GWR estimator. We compare the ScaGWR with the conventional GWR in terms of estimation accuracy and computational efficiency using a Monte Carlo simulation. Then, we apply these methods to a US income analysis. The code for ScaGWR is available in the R package scgwr. The code is embedded into C++ code and implemented in another R package, GWmodel.
We develop a new robust geographically weighted regression method in the presence of outliers. We embed the standard geographically weighted regression in robust objective function based on $gamma$-divergence. A novel feature of the proposed approach
We introduce a new approach to a linear-circular regression problem that relates multiple linear predictors to a circular response. We follow a modeling approach of a wrapped normal distribution that describes angular variables and angular distributi
Multi-task learning is increasingly used to investigate the association structure between multiple responses and a single set of predictor variables in many applications. In the era of big data, the coexistence of incomplete outcomes, large number of
With the rapid development of data collection and aggregation technologies in many scientific disciplines, it is becoming increasingly ubiquitous to conduct large-scale or online regression to analyze real-world data and unveil real-world evidence. I
The cost of both generalized least squares (GLS) and Gibbs sampling in a crossed random effects model can easily grow faster than $N^{3/2}$ for $N$ observations. Ghosh et al. (2020) develop a backfitting algorithm that reduces the cost to $O(N)$. Her