Generalized Majorization-Minimization

355 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Sobhan Naderi Parizi

تاريخ النشر 2015

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Sobhan Naderi Parizi - Kun He - Reza Aghajani

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Non-convex optimization is ubiquitous in machine learning. Majorization-Minimization (MM) is a powerful iterative procedure for optimizing non-convex functions that works by optimizing a sequence of bounds on the function. In MM, the bound at each iteration is required to emph{touch} the objective function at the optimizer of the previous bound. We show that this touching constraint is unnecessary and overly restrictive. We generalize MM by relaxing this constraint, and propose a new optimization framework, named Generalized Majorization-Minimization (G-MM), that is more flexible. For instance, G-MM can incorporate application-specific biases into the optimization procedure without changing the objective function. We derive G-MM algorithms for several latent variable models and show empirically that they consistently outperform their MM counterparts in optimizing non-convex objectives. In particular, G-MM algorithms appear to be less sensitive to initialization.

قيم البحث

65 - Jonathan Tuck , David Hallac , Stephen Boyd 2018

We consider the problem of minimizing a block separable convex function (possibly nondifferentiable, and including constraints) plus Laplacian regularization, a problem that arises in applications including model fitting, regularizing stratified mode ls, and multi-period portfolio optimization. We develop a distributed majorization-minimization method for this general problem, and derive a complete, self-contained, general, and simple proof of convergence. Our method is able to scale to very large problems, and we illustrate our approach on two applications, demonstrating its scalability and accuracy.

التحسين والتحكم

Block Alternating Bregman Majorization Minimization with Extrapolation

80 - Le Thi Khanh Hien , Duy Nhat Phan , Nicolas Gillis 2021

In this paper, we consider a class of nonsmooth nonconvex optimization problems whose objective is the sum of a block relative smooth function and a proper and lower semicontinuous block separable function. Although the analysis of block proximal gra dient (BPG) methods for the class of block $L$-smooth functions have been successfully extended to Bregman BPG methods that deal with the class of block relative smooth functions, accelerated Bregman BPG methods are scarce and challenging to design. Taking our inspiration from Nesterov-type acceleration and the majorization-minimization scheme, we propose a block alternating Bregman Majorization-Minimization framework with Extrapolation (BMME). We prove subsequential convergence of BMME to a first-order stationary point under mild assumptions, and study its global convergence under stronger conditions. We illustrate the effectiveness of BMME on the penalized orthogonal nonnegative matrix factorization problem.

التحسين والتحكم التحليل العددي معالجة الإشارات

Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization

654 - Julien Mairal 2013

Majorization-minimization algorithms consist of iteratively minimizing a majorizing surrogate of an objective function. Because of its simplicity and its wide applicability, this principle has been very popular in statistics and in signal processing. In this paper, we intend to make this principle scalable. We introduce a stochastic majorization-minimization scheme which is able to deal with large-scale or possibly infinite data sets. When applied to convex optimization problems under suitable assumptions, we show that it achieves an expected convergence rate of $O(1/sqrt{n})$ after $n$ iterations, and of $O(1/n)$ for strongly convex functions. Equally important, our scheme almost surely converges to stationary points for a large class of non-convex problems. We develop several efficient algorithms based on our framework. First, we propose a new stochastic proximal gradient method, which experimentally matches state-of-the-art solvers for large-scale $ell_1$-logistic regression. Second, we develop an online DC programming algorithm for non-convex sparse estimation. Finally, we demonstrate the effectiveness of our approach for solving large-scale structured matrix factorization problems.

التعلم الالي التعلم الآلي التحسين والتحكم

Action and Perception as Divergence Minimization

100 - Danijar Hafner , Pedro A. Ortega , Jimmy Ba 2020

We introduce a unified objective for action and perception of intelligent agents. Extending representation learning and control, we minimize the joint divergence between the combined system of agent and environment and a target distribution. Intuitiv ely, such agents use perception to align their beliefs with the world, and use actions to align the world with their beliefs. Minimizing the joint divergence to an expressive target maximizes the mutual information between the agents representations and inputs, thus inferring representations that are informative of past inputs and exploring future inputs that are informative of the representations. This lets us explain intrinsic objectives, such as representation learning, information gain, empowerment, and skill discovery from minimal assumptions. Moreover, interpreting the target distribution as a latent variable model suggests powerful world models as a path toward highly adaptive agents that seek large niches in their environments, rendering task rewards optional. The framework provides a common language for comparing a wide range of objectives, advances the understanding of latent variables for decision making, and offers a recipe for designing novel objectives. We recommend deriving future agent objectives the joint divergence to facilitate comparison, to point out the agents target distribution, and to identify the intrinsic objective terms needed to reach that distribution.

الذكاء الاصطناعي نظرية المعلومات التعلم الآلي

Relative Entropy and Catalytic Relative Majorization

88 - Soorya Rethinasamy , Mark M. Wilde 2019

Given two pairs of quantum states, a fundamental question in the resource theory of asymmetric distinguishability is to determine whether there exists a quantum channel converting one pair to the other. In this work, we reframe this question in such a way that a catalyst can be used to help perform the transformation, with the only constraint on the catalyst being that its reduced state is returned unchanged, so that it can be used again to assist a future transformation. What we find here, for the special case in which the states in a given pair are commuting, and thus quasi-classical, is that this catalytic transformation can be performed if and only if the relative entropy of one pair of states is larger than that of the other pair. This result endows the relative entropy with a fundamental operational meaning that goes beyond its traditional interpretation in the setting of independent and identical resources. Our finding thus has an immediate application and interpretation in the resource theory of asymmetric distinguishability, and we expect it to find application in other domains.

فيزياء الكم نظرية المعلومات نظرية المعلومات