Estimating Multiple Precision Matrices with Cluster Fusion Regularization

54 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Bradley Price

تاريخ النشر 2020

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Bradley S. Price - Aaron J. Molstad - Ben Sherwood

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We propose a penalized likelihood framework for estimating multiple precision matrices from different classes. Most existing methods either incorporate no information on relationships between the precision matrices, or require this information be known a priori. The framework proposed in this article allows for simultaneous estimation of the precision matrices and relationships between the precision matrices, jointly. Sparse and non-sparse estimators are proposed, both of which require solving a non-convex optimization problem. To compute our proposed estimators, we use an iterative algorithm which alternates between a convex optimization problem solved by blockwise coordinate descent and a k-means clustering problem. Blockwise updates for computing the sparse estimator require solving an elastic net penalized precision matrix estimation problem, which we solve using a proximal gradient descent algorithm. We prove that this subalgorithm has a linear rate of convergence. In simulation studies and two real data applications, we show that our method can outperform competitors that ignore relevant relationships between precision matrices and performs similarly to methods which use prior information often uknown in practice.

قيم البحث

اقرأ أيضاً

Ridge Fusion in Statistical Learning

127 - Bradley S. Price , Charles J. Geyer , 2013

We propose a penalized likelihood method to jointly estimate multiple precision matrices for use in quadratic discriminant analysis and model based clustering. A ridge penalty and a ridge fusion penalty are used to introduce shrinkage and promote sim ilarity between precision matrix estimates. Block-wise coordinate descent is used for optimization, and validation likelihood is used for tuning parameter selection. Our method is applied in quadratic discriminant analysis and semi-supervised model based clustering.

التعلم الالي التعلم الآلي حساب

An algorithmic view of $ell_2$ regularization and some path-following algorithms

165 - Yunzhang Zhu , Renxiong Liu 2021

We establish an equivalence between the $ell_2$-regularized solution path for a convex loss function, and the solution of an ordinary differentiable equation (ODE). Importantly, this equivalence reveals that the solution path can be viewed as the flo w of a hybrid of gradient descent and Newton method applying to the empirical loss, which is similar to a widely used optimization technique called trust region method. This provides an interesting algorithmic view of $ell_2$ regularization, and is in contrast to the conventional view that the $ell_2$ regularization solution path is similar to the gradient flow of the empirical loss.New path-following algorithms based on homotopy methods and numerical ODE solvers are proposed to numerically approximate the solution path. In particular, we consider respectively Newton method and gradient descent method as the basis algorithm for the homotopy method, and establish their approximation error rates over the solution path. Importantly, our theory suggests novel schemes to choose grid points that guarantee an arbitrarily small suboptimality for the solution path. In terms of computational cost, we prove that in order to achieve an $epsilon$-suboptimality for the entire solution path, the number of Newton steps required for the Newton method is $mathcal O(epsilon^{-1/2})$, while the number of gradient steps required for the gradient descent method is $mathcal Oleft(epsilon^{-1} ln(epsilon^{-1})right)$. Finally, we use $ell_2$-regularized logistic regression as an illustrating example to demonstrate the effectiveness of the proposed path-following algorithms.

التعلم الالي التعلم الآلي حساب

Causal Regularization

66 - Dominik Janzing 2019

I argue that regularizing terms in standard regression methods not only help against overfitting finite data, but sometimes also yield better causal models in the infinite sample regime. I first consider a multi-dimensional variable linearly influenc ing a target variable with some multi-dimensional unobserved common cause, where the confounding effect can be decreased by keeping the penalizing term in Ridge and Lasso regression even in the population limit. Choosing the size of the penalizing term, is however challenging, because cross validation is pointless. Here it is done by first estimating the strength of confounding via a method proposed earlier, which yielded some reasonable results for simulated and real data. Further, I prove a `causal generalization bound which states (subject to a particular model of confounding) that the error made by interpreting any non-linear regression as causal model can be bounded from above whenever functions are taken from a not too rich class. In other words, the bound guarantees generalization from observational to interventional distributions, which is usually not subject of statistical learning theory (and is only possible due to the underlying symmetries of the confounder model).

التعلم الالي التعلم الآلي

Global Non-convex Optimization with Discretized Diffusions

170 - Murat A. Erdogdu , Lester Mackey , Ohad Shamir 2018

An Euler discretization of the Langevin diffusion is known to converge to the global minimizers of certain convex and non-convex optimization problems. We show that this property holds for any suitably smooth diffusion and that different diffusions a re suitable for optimizing different classes of convex and non-convex functions. This allows us to design diffusions suitable for globally optimizing convex and non-convex functions not covered by the existing Langevin theory. Our non-asymptotic analysis delivers computable optimization and integration error bounds based on easily accessed properties of the objective and chosen diffusion. Central to our approach are new explicit Stein factor bounds on the solutions of Poisson equations. We complement these results with improved optimization guarantees for targets other than the standard Gibbs measure.

التعلم الالي التعلم الآلي حساب

Hidden Markov Models with mixtures as emission distributions

379 - Stevenn Volant , Caroline Berard , Marie-Laure Martin-Magniette andn Stephane Robin 2012

In unsupervised classification, Hidden Markov Models (HMM) are used to account for a neighborhood structure between observations. The emission distributions are often supposed to belong to some parametric family. In this paper, a semiparametric model ing where the emission distributions are a mixture of parametric distributions is proposed to get a higher flexibility. We show that the classical EM algorithm can be adapted to infer the model parameters. For the initialisation step, starting from a large number of components, a hierarchical method to combine them into the hidden states is proposed. Three likelihood-based criteria to select the components to be combined are discussed. To estimate the number of hidden states, BIC-like criteria are derived. A simulation study is carried out both to determine the best combination between the merging criteria and the model selection criteria and to evaluate the accuracy of classification. The proposed method is also illustrated using a biological dataset from the model plant Arabidopsis thaliana. A R package HMMmix is freely available on the CRAN.

التعلم الالي التعلم الآلي حساب