New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Efficient regularized isotonic regression with application to gene--gene interaction search

408 0 0.0 ( 0 )

Download Cite

Added by Ronny Luss

Publication date 2011

fields Mathematical Statistics Informatics Engineering

and research's language is English

Authors Ronny Luss - Saharon Rosset - Moni Shahar

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Isotonic regression is a nonparametric approach for fitting monotonic models to data that has been widely studied from both theoretical and practical perspectives. However, this approach encounters computational and statistical overfitting issues in higher dimensions. To address both concerns, we present an algorithm, which we term Isotonic Recursive Partitioning (IRP), for isotonic regression based on recursively partitioning the covariate space through solution of progressively smaller best cut subproblems. This creates a regularized sequence of isotonic models of increasing model complexity that converges to the global isotonic regression solution. The models along the sequence are often more accurate than the unregularized isotonic regression model because of the complexity control they offer. We quantify this complexity control through estimation of degrees of freedom along the path. Success of the regularized models in prediction and IRPs favorable computational properties are demonstrated through a series of simulated and real data experiments. We discuss application of IRP to the problem of searching for gene--gene interactions and epistasis, and demonstrate it on data from genome-wide association studies of three common diseases.

rate research

A Paradigmatic Regression Algorithm for Gene Selection Problems

104 - Stephane Guerrier 2015

Motivation: Gene selection has become a common task in most gene expression studies. The objective of such research is often to identify the smallest possible set of genes that can still achieve good predictive performance. The problem of assigning tumours to a known class is a particularly important example that has received considerable attention in the last ten years. Many of the classification methods proposed recently require some form of dimension-reduction of the problem. These methods provide a single model as an output and, in most cases, rely on the likelihood function in order to achieve variable selection. Results: We propose a prediction-based objective function that can be tailored to the requirements of practitioners and can be used to assess and interpret a given problem. The direct optimization of such a function can be very difficult because the problem is potentially discontinuous and nonconvex. We therefore propose a general procedure for variable selection that resembles importance sampling to explore the feature space. Our proposal compares favorably with competing alternatives when applied to two cancer data sets in that smaller models are obtained for better or at least comparable classification errors. Furthermore by providing a set of selected models instead of a single one, we construct a network of possible models for a target prediction accuracy level.

Methodology

Generalized Isotonic Regression

170 - Ronny Luss , Saharon Rosset 2011

We present a computational and statistical approach for fitting isotonic models under convex differentiable loss functions. We offer a recursive partitioning algorithm which provably and efficiently solves isotonic regression under any such loss function. Models along the partitioning path are also isotonic and can be viewed as regularized solutions to the problem. Our approach generalizes and subsumes two previous results: the well-known work of Barlow and Brunk (1972) on fitting isotonic regressions subject to specially structured loss functions, and a recursive partitioning algorithm (Spouge et al 2003) for the case of standard (l2-loss) isotonic regression. We demonstrate the advantages of our generalized algorithm on both real and simulated data in two settings: fitting count data using negative Poisson log-likelihood loss, and fitting robust isotonic regression using Hubers loss.

Methodology

Combined Hypothesis Testing on Graphs with Applications to Gene Set Enrichment Analysis

115 - Shulei Wang , Ming Yuan 2016

Motivated by gene set enrichment analysis, we investigate the problem of combined hypothesis testing on a graph. We introduce a general framework to effectively use the structural information of the underlying graph when testing multivariate means. A new testing procedure is proposed within this framework. We show that the test is optimal in that it can consistently detect departure from the collective null at a rate that no other test could improve, for almost all graphs. We also provide general performance bounds for the proposed test under any specific graph, and illustrate their utility through several common types of graphs. Numerical experiments are presented to further demonstrate the merits of our approach.

Methodology Statistics Theory Applications

Regularization Strategies for Hyperplane Classifiers: Application to Cancer Classification with Gene Expression Data

89 - Erik Andries 2006

Linear discrimination, from the point of view of numerical linear algebra, can be treated as solving an ill-posed system of linear equations. In order to generate a solution that is robust in the presence of noise, these problems require regularization. Here, we examine the ill-posedness involved in the linear discrimination of cancer gene expression data with respect to outcome and tumor subclasses. We show that a filter factor representation, based upon Singular Value Decomposition, yields insight into the numerical ill-posedness of the hyperplane-based separation when applied to gene expression data. We also show that this representation yields useful diagnostic tools for guiding the selection of classifier parameters, thus leading to improved performance.

Genomics

Interaction Networks from Discrete Event Data by Poisson Multivariate Mutual Information Estimation and Information Flow with Applications from Gene Expression Data

112 - Jeremie Fish , Jie Sun , Erik Bollt 2020

In this work, we introduce a new methodology for inferring the interaction structure of discrete valued time series which are Poisson distributed. While most related methods are premised on continuous state stochastic processes, in fact, discrete and counting event oriented stochastic process are natural and common, so called time-point processes (TPP). An important application that we focus on here is gene expression. Nonparameteric methods such as the popular k-nearest neighbors (KNN) are slow converging for discrete processes, and thus data hungry. Now, with the new multi-variate Poisson estimator developed here as the core computational engine, the causation entropy (CSE) principle, together with the associated greedy search algorithm optimal CSE (oCSE) allows us to efficiently infer the true network structure for this class of stochastic processes that were previously not practical. We illustrate the power of our method, first in benchmarking with synthetic datum, and then by inferring the genetic factors network from a breast cancer micro-RNA (miRNA) sequence count data set. We show the Poisson oCSE gives the best performance among the tested methods anfmatlabd discovers previously known interactions on the breast cancer data set.

Methodology Data Analysis Statistics and Probability

comments

Fetching comments

Al-Etihad University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Efficient regularized isotonic regression with application to gene--gene interaction search

Ask ChatGPT about the research

No Arabic abstract

Read More