No Arabic abstract
The quality of datasets is a critical issue in big data mining. More interesting things could be mined from datasets with higher quality. The existence of missing values in geographical data would worsen the quality of big datasets. To improve the data quality, the missing values are generally needed to be estimated using various machine learning algorithms or mathematical methods such as approximations and interpolations. In this paper, we propose an adaptive Radial Basis Function (RBF) interpolation algorithm for estimating missing values in geographical data. In the proposed method, the samples with known values are considered as the data points, while the samples with missing values are considered as the interpolated points. For each interpolated point, first, a local set of data points are adaptively determined. Then, the missing value of the interpolated point is imputed via interpolating using the RBF interpolation based on the local set of data points. Moreover, the shape factors of the RBF are also adaptively determined by considering the distribution of the local set of data points. To evaluate the performance of the proposed method, we compare our method with the commonly used k Nearest Neighbors (kNN) interpolation and Adaptive Inverse Distance Weighted (AIDW) methods, and conduct three groups of benchmark experiments. Experimental results indicate that the proposed method outperforms the kNN interpolation and AIDW in terms of accuracy, but worse than the kNN interpolation and AIDW in terms of efficiency.
Many tensor-based data completion methods aim to solve image and video in-painting problems. But, all methods were only developed for a single dataset. In most of real applications, we can usually obtain more than one dataset to reflect one phenomenon, and all the datasets are mutually related in some sense. Thus one question raised whether such the relationship can improve the performance of data completion or not? In the paper, we proposed a novel and efficient method by exploiting the relationship among datasets for multi-video data completion. Numerical results show that the proposed method significantly improve the performance of video in-painting, particularly in the case of very high missing percentage.
The partition of unity (PU) method, performed with local radial basis function (RBF) approximants, has already been proved to be an effective tool for solving interpolation or collocation problems when large data sets are considered. It decomposes the original domain into several subdomains or patches so that only linear systems of relatively small size need to be solved. In research on such partition of unity methods, such subdomains usually consist of spherical patches of a fixed radius. However, for particular data sets, such as track data, ellipsoidal patches seem to be more suitable. Therefore, in this paper, we propose a scheme based on a priori error estimates for selecting the sizes of such variable ellipsoidal subdomains. We jointly solve for both these domain decomposition parameters and the anisotropic RBF shape parameters on each subdomain to achieve superior accuracy in comparison to the standard partition of unity method.
Data sites selected from modeling high-dimensional problems often appear scattered in non-paternalistic ways. Except for sporadic-clustering at some spots, they become relatively far apart as the dimension of the ambient space grows. These features defy any theoretical treatment that requires local or global quasi-uniformity of distribution of data sites. Incorporating a recently-developed application of integral operator theory in machine learning, we propose and study in the current article a new framework to analyze kernel interpolation of high dimensional data, which features bounding stochastic approximation error by a hybrid (discrete and continuous) $K$-functional tied to the spectrum of the underlying kernel matrix. Both theoretical analysis and numerical simulations show that spectra of kernel matrices are reliable and stable barometers for gauging the performance of kernel-interpolation methods for high dimensional data.
We formulate an oversampled radial basis function generated finite difference (RBF-FD) method to solve time-dependent nonlinear conservation laws. The analytic solutions of these problems are known to be discontinuous, which leads to occurrence of non-physical oscillations (Gibbs phenomenon) that pollute the numerical solutions and can make them unstable. We address these difficulties using a residual based artificial viscosity stabilization, where the residual of the conservation law indicates the approximate location of the shocks. The location is then used to locally apply an upwind viscosity term, which stabilizes the Gibbs phenomenon and does not smear the solution away from the shocks. The proposed method is numerically tested and proves to be robust and accurate when solving scalar conservation laws and systems of conservation laws, such as compressible Euler equations.
A main drawback of classical Tikhonov regularization is that often the parameters required to apply theoretical results, e.g., the smoothness of the sought-after solution and the noise level, are unknown in practice. In this paper we investigate in new detail the residuals in Tikhonov regularization viewed as functions of the regularization parameter. We show that the residual carries, with some restrictions, the information on both the unknown solution and the noise level. By calculating approximate solutions for a large range of regularization parameters, we can extract both parameters from the residual given only one set of noisy data and the forward operator. The smoothness in the residual allows to revisit parameter choice rules and relate a-priori, a-posteriori, and heuristic rules in a novel way that blurs the lines between the classical division of the parameter choice rules. All results are accompanied by numerical experiments.