ﻻ يوجد ملخص باللغة العربية
This paper introduces a new way to calculate distance-based statistics, particularly when the data are multivariate. The main idea is to pre-calculate the optimal projection directions given the variable dimension, and to project multidimensional variables onto these pre-specified projection directions; by subsequently utilizing the fast algorithm that is developed in Huo and Szekely [2016] for the univariate variables, the computational complexity can be improved from $O(m^2)$ to $O(n m cdot mbox{log}(m))$, where $n$ is the number of projection directions and $m$ is the sample size. When $n ll m/log(m)$, computational savings can be achieved. The key challenge is how to find the optimal pre-specified projection directions. This can be obtained by minimizing the worse-case difference between the true distance and the approximated distance, which can be formulated as a nonconvex optimization problem in a general setting. In this paper, we show that the exact solution of the nonconvex optimization problem can be derived in two special cases: the dimension of the data is equal to either $2$ or the number of projection directions. In the generic settings, we propose an algorithm to find some approximate solutions. Simulations confirm the advantage of our method, in comparison with the pure Monte Carlo approach, in which the directions are randomly selected rather than pre-calculated.
In some cases, computational benefit can be gained by exploring the hyper parameter space using a deterministic set of grid points instead of a Markov chain. We view this as a numerical integration problem and make three unique contributions. First,
When a Genetic Algorithm (GA), or a stochastic algorithm in general, is employed in a statistical problem, the obtained result is affected by both variability due to sampling, that refers to the fact that only a sample is observed, and variability du
Statistical Data Assimilation (SDA) is the transfer of information from field or laboratory observations to a user selected model of the dynamical system producing those observations. The data is noisy and the model has errors; the information transf
The information-based optimal subdata selection (IBOSS) is a computationally efficient method to select informative data points from large data sets through processing full data by columns. However, when the volume of a data set is too large to be pr
It is useful to have mathematical criteria for evaluating errors in map projections. The Chebyshev criterion for minimizing rms (root mean square) local scale factor errors for conformal maps has been useful in developing conformal map projections of