Sibling Regression for Generalized Linear Models

129 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Shiv Shankar

تاريخ النشر 2021

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Shiv Shankar - Daniel Sheldon

المنهجية التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Field observations form the basis of many scientific studies, especially in ecological and social sciences. Despite efforts to conduct such surveys in a standardized way, observations can be prone to systematic measurement errors. The removal of systematic variability introduced by the observation process, if possible, can greatly increase the value of this data. Existing non-parametric techniques for correcting such errors assume linear additive noise models. This leads to biased estimates when applied to generalized linear models (GLM). We present an approach based on residual functions to address this limitation. We then demonstrate its effectiveness on synthetic data and show it reduces systematic detection variability in moth surveys.

قيم البحث

143 - Shiv Shankar , Daniel Sheldon , Tao Sun 2020

Many ecological studies and conservation policies are based on field observations of species, which can be affected by systematic variability introduced by the observation process. A recently introduced causal modeling technique called half-sibling r egression can detect and correct for systematic errors in measurements of multiple independent random variables. However, it will remove intrinsic variability if the variables are dependent, and therefore does not apply to many situations, including modeling of species counts that are controlled by common causes. We present a technique called three-quarter sibling regression to partially overcome this limitation. It can filter the effect of systematic noise when the latent variables have observed common causes. We provide theoretical justification of this approach, demonstrate its effectiveness on synthetic data, and show that it reduces systematic detection variability due to moon brightness in moth surveys.

المنهجية التعلم الآلي

A Mixture of Linear-Linear Regression Models for Linear-Circular Regression

96 - Ali Esmaieeli Sikaroudi , Chiwoo Park 2016

We introduce a new approach to a linear-circular regression problem that relates multiple linear predictors to a circular response. We follow a modeling approach of a wrapped normal distribution that describes angular variables and angular distributi ons and advances it for a linear-circular regression analysis. Some previous works model a circular variable as projection of a bivariate Gaussian random vector on the unit square, and the statistical inference of the resulting model involves complicated sampling steps. The proposed model treats circular responses as the result of the modulo operation on unobserved linear responses. The resulting model is a mixture of multiple linear-linear regression models. We present two EM algorithms for maximum likelihood estimation of the mixture model, one for a parametric model and another for a non-parametric model. The estimation algorithms provide a great trade-off between computation and estimation accuracy, which was numerically shown using five numerical examples. The proposed approach was applied to a problem of estimating wind directions that typically exhibit complex patterns with large variation and circularity.

المنهجية

Stochastic Search for Semiparametric Linear Regression Models

446 - Lutz Duembgen , Dominic Schuhmacher , Richard Samworth 2011

This paper introduces and analyzes a stochastic search method for parameter estimation in linear regression models in the spirit of Beran and Millar (1987). The idea is to generate a random finite subset of a parameter space which will automatically contain points which are very close to an unknown true parameter. The motivation for this procedure comes from recent work of Duembgen, Samworth and Schuhmacher (2011) on regression models with log-concave error distributions.

المنهجية نظرية الإحصاء نظرية الإحصاء

Tests for High Dimensional Generalized Linear Models

462 - Song Xi Chen , Bin Guo 2014

We consider testing regression coefficients in high dimensional generalized linear models. An investigation of the test of Goeman et al. (2011) is conducted, which reveals that if the inverse of the link function is unbounded, the high dimensionality in the covariates can impose adverse impacts on the power of the test. We propose a test formation which can avoid the adverse impact of the high dimensionality. When the inverse of the link function is bounded such as the logistic or probit regression, the proposed test is as good as Goeman et al. (2011)s test. The proposed tests provide p-values for testing significance for gene-sets as demonstrated in a case study on an acute lymphoblastic leukemia dataset.

المنهجية

Estimation and Feature Selection in Mixtures of Generalized Linear Experts Models

287 - Bao Tuyen Huynh , Faicel Chamroukhi 2019

Mixtures-of-Experts (MoE) are conditional mixture models that have shown their performance in modeling heterogeneity in data in many statistical learning approaches for prediction, including regression and classification, as well as for clustering. T heir estimation in high-dimensional problems is still however challenging. We consider the problem of parameter estimation and feature selection in MoE models with different generalized linear experts models, and propose a regularized maximum likelihood estimation that efficiently encourages sparse solutions for heterogeneous data with high-dimensional predictors. The developed proximal-Newton EM algorithm includes proximal Newton-type procedures to update the model parameter by monotonically maximizing the objective function and allows to perform efficient estimation and feature selection. An experimental study shows the good performance of the algorithms in terms of recovering the actual sparse solutions, parameter estimation, and clustering of heterogeneous regression data, compared to the main state-of-the art competitors.

المنهجية التعلم الآلي تطبيقات الإحصاء