Faster Rates for training Max-Margin Markov Networks

153 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ankan Saha

تاريخ النشر 2010

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Xinhua Zhang Purduen University

التعلم الآلي التعقيد الحسابي

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Structured output prediction is an important machine learning problem both in theory and practice, and the max-margin Markov network (mcn) is an effective approach. All state-of-the-art algorithms for optimizing mcn objectives take at least $O(1/epsilon)$ number of iterations to find an $epsilon$ accurate solution. Recent results in structured optimization suggest that faster rates are possible by exploiting the structure of the objective function. Towards this end citet{Nesterov05} proposed an excessive gap reduction technique based on Euclidean projections which converges in $O(1/sqrt{epsilon})$ iterations on strongly convex functions. Unfortunately when applied to mcn s, this approach does not admit graphical model factorization which, as in many existing algorithms, is crucial for keeping the cost per iteration tractable. In this paper, we present a new excessive gap reduction technique based on Bregman projections which admits graphical model factorization naturally, and converges in $O(1/sqrt{epsilon})$ iterations. Compared with existing algorithms, the convergence rate of our method has better dependence on $epsilon$ and other parameters of the problem, and can be easily kernelized.

قيم البحث

61 - Alex Nowak-Vila , Francis Bach , Alessandro Rudi 2020

Max-margin methods for binary classification such as the support vector machine (SVM) have been extended to the structured prediction setting under the name of max-margin Markov networks ($M^3N$), or more generally structural SVMs. Unfortunately, the se methods are statistically inconsistent when the relationship between inputs and labels is far from deterministic. We overcome such limitations by defining the learning problem in terms of a max-min margin formulation, naming the resulting method max-min margin Markov networks ($M^4N$). We prove consistency and finite sample generalization bounds for $M^4N$ and provide an explicit algorithm to compute the estimator. The algorithm achieves a generalization error of $O(1/sqrt{n})$ for a total cost of $O(n)$ projection-oracle calls (which have at most the same cost as the max-oracle from $M^3N$). Experiments on multi-class classification, ordinal regression, sequence prediction and ranking demonstrate the effectiveness of the proposed method.

التعلم الآلي التعلم الالي

Max-Margin is Dead, Long Live Max-Margin!

172 - Alex Nowak-Vila , Alessandro Rudi , Francis Bach 2021

The foundational concept of Max-Margin in machine learning is ill-posed for output spaces with more than two labels such as in structured prediction. In this paper, we show that the Max-Margin loss can only be consistent to the classification task un der highly restrictive assumptions on the discrete loss measuring the error between outputs. These conditions are satisfied by distances defined in tree graphs, for which we prove consistency, thus being the first losses shown to be consistent for Max-Margin beyond the binary setting. We finally address these limitations by correcting the concept of Max-Margin and introducing the Restricted-Max-Margin, where the maximization of the loss-augmented scores is maintained, but performed over a subset of the original domain. The resulting loss is also a generalization of the binary support vector machine and it is consistent under milder conditions on the discrete loss.

التعلم الآلي التعلم الالي

Learning convex polytopes with margin

237 - Lee-Ad Gottlieb , Eran Kaufman , Aryeh Kontorovich 2018

We present an improved algorithm for properly learning convex polytopes in the realizable PAC setting from data with a margin. Our learning algorithm constructs a consistent polytope as an intersection of about $t log t$ halfspaces with margins in ti me polynomial in $t$ (where $t$ is the number of halfspaces forming an optimal polytope). We also identify distinct generalizations of the notion of margin from hyperplanes to polytopes and investigate how they relate geometrically; this result may be of interest beyond the learning setting.

التعلم الآلي التعقيد الحسابي الهندسة الحسابية

Beyond Max-Margin: Class Margin Equilibrium for Few-shot Object Detection

323 - Bohao Li , Boyu Yang , Chang Liu 2021

Few-shot object detection has made substantial progressby representing novel class objects using the feature representation learned upon a set of base class objects. However,an implicit contradiction between novel class classification and representat ion is unfortunately ignored. On the one hand, to achieve accurate novel class classification, the distributions of either two base classes must be far away fromeach other (max-margin). On the other hand, to precisely represent novel classes, the distributions of base classes should be close to each other to reduce the intra-class distance of novel classes (min-margin). In this paper, we propose a class margin equilibrium (CME) approach, with the aim to optimize both feature space partition and novel class reconstruction in a systematic way. CME first converts the few-shot detection problem to the few-shot classification problem by using a fully connected layer to decouple localization features. CME then reserves adequate margin space for novel classes by introducing simple-yet-effective class margin loss during feature learning. Finally, CME pursues margin equilibrium by disturbing the features of novel class instances in an adversarial min-max fashion. Experiments on Pascal VOC and MS-COCO datasets show that CME significantly improves upon two baseline detectors (up to $3sim 5%$ in average), achieving state-of-the-art performance. Code is available at https://github.com/Bohao-Lee/CME .

الرؤية الحاسوبية وتمييز الأنماط

Max-Plus Matching Pursuit for Deterministic Markov Decision Processes

114 - Francis Bach 2019

We consider deterministic Markov decision processes (MDPs) and apply max-plus algebra tools to approximate the value iteration algorithm by a smaller-dimensional iteration based on a representation on dictionaries of value functions. The setup natura lly leads to novel theoretical results which are simply formulated due to the max-plus algebra structure. For example, when considering a fixed (non adaptive) finite basis, the computational complexity of approximating the optimal value function is not directly related to the number of states, but to notions of covering numbers of the state space. In order to break the curse of dimensionality in factored state-spaces, we consider adaptive basis that can adapt to particular problems leading to an algorithm similar to matching pursuit from signal processing. They currently come with no theoretical guarantees but work empirically well on simple deterministic MDPs derived from low-dimensional continuous control problems. We focus primarily on deterministic MDPs but note that the framework can be applied to all MDPs by considering measure-based formulations.

التعلم الآلي التعلم الالي