Do you want to publish a course? Click here

Generalized Isotonic Regression

164   0   0.0 ( 0 )
 Added by Ronny Luss
 Publication date 2011
and research's language is English




Ask ChatGPT about the research

We present a computational and statistical approach for fitting isotonic models under convex differentiable loss functions. We offer a recursive partitioning algorithm which provably and efficiently solves isotonic regression under any such loss function. Models along the partitioning path are also isotonic and can be viewed as regularized solutions to the problem. Our approach generalizes and subsumes two previous results: the well-known work of Barlow and Brunk (1972) on fitting isotonic regressions subject to specially structured loss functions, and a recursive partitioning algorithm (Spouge et al 2003) for the case of standard (l2-loss) isotonic regression. We demonstrate the advantages of our generalized algorithm on both real and simulated data in two settings: fitting count data using negative Poisson log-likelihood loss, and fitting robust isotonic regression using Hubers loss.



rate research

Read More

We consider the online version of the isotonic regression problem. Given a set of linearly ordered points (e.g., on the real line), the learner must predict labels sequentially at adversarially chosen positions and is evaluated by her total squared loss compared against the best isotonic (non-decreasing) function in hindsight. We survey several standard online learning algorithms and show that none of them achieve the optimal regret exponent; in fact, most of them (including Online Gradient Descent, Follow the Leader and Exponential Weights) incur linear regret. We then prove that the Exponential Weights algorithm played over a covering net of isotonic functions has a regret bounded by $Obig(T^{1/3} log^{2/3}(T)big)$ and present a matching $Omega(T^{1/3})$ lower bound on regret. We provide a computationally efficient version of this algorithm. We also analyze the noise-free case, in which the revealed labels are isotonic, and show that the bound can be improved to $O(log T)$ or even to $O(1)$ (when the labels are revealed in isotonic order). Finally, we extend the analysis beyond squared loss and give bounds for entropic loss and absolute loss.
Isotonic regression is a nonparametric approach for fitting monotonic models to data that has been widely studied from both theoretical and practical perspectives. However, this approach encounters computational and statistical overfitting issues in higher dimensions. To address both concerns, we present an algorithm, which we term Isotonic Recursive Partitioning (IRP), for isotonic regression based on recursively partitioning the covariate space through solution of progressively smaller best cut subproblems. This creates a regularized sequence of isotonic models of increasing model complexity that converges to the global isotonic regression solution. The models along the sequence are often more accurate than the unregularized isotonic regression model because of the complexity control they offer. We quantify this complexity control through estimation of degrees of freedom along the path. Success of the regularized models in prediction and IRPs favorable computational properties are demonstrated through a series of simulated and real data experiments. We discuss application of IRP to the problem of searching for gene--gene interactions and epistasis, and demonstrate it on data from genome-wide association studies of three common diseases.
Field observations form the basis of many scientific studies, especially in ecological and social sciences. Despite efforts to conduct such surveys in a standardized way, observations can be prone to systematic measurement errors. The removal of systematic variability introduced by the observation process, if possible, can greatly increase the value of this data. Existing non-parametric techniques for correcting such errors assume linear additive noise models. This leads to biased estimates when applied to generalized linear models (GLM). We present an approach based on residual functions to address this limitation. We then demonstrate its effectiveness on synthetic data and show it reduces systematic detection variability in moth surveys.
127 - Ding Xiang , Galin L. Jones 2017
We consider penalized regression models under a unified framework where the particular method is determined by the form of the penalty term. We propose a fully Bayesian approach that incorporates both sparse and dense settings and show how to use a type of model averaging approach to eliminate the nuisance penalty parameters and perform inference through the marginal posterior distribution of the regression coefficients. We establish tail robustness of the resulting estimator as well as conditional and marginal posterior consistency. We develop an efficient component-wise Markov chain Monte Carlo algorithm for sampling. Numerical results show that the method tends to select the optimal penalty and performs well in both variable selection and prediction and is comparable to, and often better than alternative methods. Both simulated and real data examples are provided.
Record linkage (de-duplication or entity resolution) is the process of merging noisy databases to remove duplicate entities. While record linkage removes duplicate entities from such databases, the downstream task is any inferential, predictive, or post-linkage task on the linked data. One goal of the downstream task is obtaining a larger reference data set, allowing one to perform more accurate statistical analyses. In addition, there is inherent record linkage uncertainty passed to the downstream task. Motivated by the above, we propose a generalized Bayesian record linkage method and consider multiple regression analysis as the downstream task. Records are linked via a random partition model, which allows for a wide class to be considered. In addition, we jointly model the record linkage and downstream task, which allows one to account for the record linkage uncertainty exactly. Moreover, one is able to generate a feedback propagation mechanism of the information from the proposed Bayesian record linkage model into the downstream task. This feedback effect is essential to eliminate potential biases that can jeopardize resulting downstream task. We apply our methodology to multiple linear regression, and illustrate empirically that the feedback effect is able to improve the performance of record linkage.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا