Robust Gaussian Process Regression Based on Iterative Trimming

70 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zhaozhou Li

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية فيزياء

والبحث باللغة English

تأليف Zhao-Zhou Li - Lu Li - Zhengyi Shao

التعلم الآلي الأجهزة والأساليب للزيئات الفيزياء الفلكية التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The Gaussian process (GP) regression can be severely biased when the data are contaminated by outliers. This paper presents a new robust GP regression algorithm that iteratively trims the most extreme data points. While the new algorithm retains the attractive properties of the standard GP as a nonparametric and flexible regression method, it can greatly improve the model accuracy for contaminated data even in the presence of extreme or abundant outliers. It is also easier to implement compared with previous robust GP variants that rely on approximate inference. Applied to a wide range of experiments with different contamination levels, the proposed method significantly outperforms the standard GP and the popular robust GP variant with the Student-t likelihood in most test cases. In addition, as a practical example in the astrophysical study, we show that this method can precisely determine the main-sequence ridge line in the color-magnitude diagram of star clusters.

قيم البحث

60 - Chiwoo Park , David J. Borth , Nicholas S. Wilson 2020

This paper presents a new approach to a robust Gaussian process (GP) regression. Most existing approaches replace an outlier-prone Gaussian likelihood with a non-Gaussian likelihood induced from a heavy tail distribution, such as the Laplace distribu tion and Student-t distribution. However, the use of a non-Gaussian likelihood would incur the need for a computationally expensive Bayesian approximate computation in the posterior inferences. The proposed approach models an outlier as a noisy and biased observation of an unknown regression function, and accordingly, the likelihood contains bias terms to explain the degree of deviations from the regression function. We entail how the biases can be estimated accurately with other hyperparameters by a regularized maximum likelihood estimation. Conditioned on the bias estimates, the robust GP regression can be reduced to a standard GP regression problem with analytical forms of the predictive mean and variance estimates. Therefore, the proposed approach is simple and very computationally attractive. It also gives a very robust and accurate GP estimate for many tested scenarios. For the numerical evaluation, we perform a comprehensive simulation study to evaluate the proposed approach with the comparison to the existing robust GP approaches under various simulated scenarios of different outlier proportions and different noise levels. The approach is applied to data from two measurement systems, where the predictors are based on robust environmental parameter measurements and the response variables utilize more complex chemical sensing methods that contain a certain percentage of outliers. The utility of the measurement systems and value of the environmental data are improved through the computationally efficient GP regression and bias model.

التعلم الآلي المنهجية التعلم الالي

Temporal Gaussian Process Regression in Logarithmic Time

76 - Adrien Corenflos , Zheng Zhao , Simo Sarkka 2021

The aim of this article is to present a novel parallelization method for temporal Gaussian process (GP) regression problems. The method allows for solving GP regression problems in logarithmic O(log N) time, where N is the number of time steps. Our a pproach uses the state-space representation of GPs which in its original form allows for linear O(N) time GP regression by leveraging the Kalman filtering and smoothing methods. By using a recently proposed parallelization method for Bayesian filters and smoothers, we are able to reduce the linear computational complexity of the temporal GP regression problems into logarithmic span complexity. This ensures logarithmic time complexity when run on parallel hardware such as a graphics processing unit (GPU). We experimentally demonstrate the computational benefits on simulated and real datasets via our open-source implementation leveraging the GPflow framework.

التعلم الآلي حساب المنهجية

Component-wise Adaptive Trimming For Robust Mixture Regression

187 - Wennan Chang , Xinyu Zhou , Yong Zang 2020

Parameter estimation of mixture regression model using the expectation maximization (EM) algorithm is highly sensitive to outliers. Here we propose a fast and efficient robust mixture regression algorithm, called Component-wise Adaptive Trimming (CAT ) method. We consider simultaneous outlier detection and robust parameter estimation to minimize the effect of outlier contamination. Robust mixture regression has many important applications including in human cancer genomics data, where the population often displays strong heterogeneity added by unwanted technological perturbations. Existing robust mixture regression methods suffer from outliers as they either conduct parameter estimation in the presence of outliers, or rely on prior knowledge of the level of outlier contamination. CAT was implemented in the framework of classification expectation maximization, under which a natural definition of outliers could be derived. It implements a least trimmed squares (LTS) approach within each exclusive mixing component, where the robustness issue could be transformed from the mixture case to simple linear regression case. The high breakdown point of the LTS approach allows us to avoid the pre-specification of trimming parameter. Compared with multiple existing algorithms, CAT is the most competitive one that can handle and adaptively trim off outliers as well as heavy tailed noise, in different scenarios of simulated data and real genomic data. CAT has been implemented in an R package `RobMixReg available in CRAN.

المنهجية التعلم الآلي

Improving gravitational-wave parameter estimation using Gaussian process regression

123 - Christopher J. Moore , Christopher P. L. Berry , Alvin J. K. Chua andn Jonathan R. Gair 2015

Folding uncertainty in theoretical models into Bayesian parameter estimation is necessary in order to make reliable inferences. A general means of achieving this is by marginalizing over model uncertainty using a prior distribution constructed using Gaussian process regression (GPR). As an example, we apply this technique to the measurement of chirp mass using (simulated) gravitational-wave signals from binary black holes that could be observed using advanced-era gravitational-wave detectors. Unless properly accounted for, uncertainty in the gravitational-wave templates could be the dominant source of error in studies of these systems. We explain our approach in detail and provide proofs of various features of the method, including the limiting behavior for high signal-to-noise, where systematic model uncertainties dominate over noise errors. We find that the marginalized likelihood constructed via GPR offers a significant improvement in parameter estimation over the standard, uncorrected likelihood both in our simple one-dimensional study, and theoretically in general. We also examine the dependence of the method on the size of training set used in the GPR; on the form of covariance function adopted for the GPR, and on changes to the detector noise power spectral density.

النسبية العامة وهدية الكونيات الكم الأجهزة والأساليب للزيئات الفيزياء الفلكية

Latent Gaussian Process Regression

101 - Erik Bodin , Neill D. F. Campbell , Carl Henrik Ek 2017

We introduce Latent Gaussian Process Regression which is a latent variable extension allowing modelling of non-stationary multi-modal processes using GPs. The approach is built on extending the input space of a regression problem with a latent variab le that is used to modulate the covariance function over the training data. We show how our approach can be used to model multi-modal and non-stationary processes. We exemplify the approach on a set of synthetic data and provide results on real data from motion capture and geostatistics.

التعلم الالي التعلم الآلي