أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Massimiliano Pontil

Online Model Selection: a Rested Bandit Formulation

224 - Leonardo Cella , Claudio Gentile , Massimiliano Pontil 2020

Motivated by a natural problem in online model selection with bandit information, we introduce and analyze a best arm identification problem in the rested bandit setting, wherein arm expected losses decrease with the number of times the arm has been played. The shape of the expected loss functions is similar across arms, and is assumed to be available up to unknown parameters that have to be learned on the fly. We define a novel notion of regret for this problem, where we compare to the policy that always plays the arm having the smallest expected loss at the end of the game. We analyze an arm elimination algorithm whose regret vanishes as the time horizon increases. The actual rate of convergence depends in a detailed way on the postulated functional form of the expected losses. Unlike known model selection efforts in the recent bandit literature, our algorithm exploits the specific structure of the problem to learn the unknown parameters of the expected loss function so as to identify the best arm as quickly as possible. We complement our analysis with a lower bound, indicating strengths and limitations of the proposed solution.

التعلم الالي التعلم الآلي

On the Iteration Complexity of Hypergradient Computation

74 - Riccardo Grazzi , Luca Franceschi , Massimiliano Pontil 2020

We study a general class of bilevel problems, consisting in the minimization of an upper-level objective which depends on the solution to a parametric fixed-point equation. Important instances arising in machine learning include hyperparameter optimi zation, meta-learning, and certain graph and recurrent neural networks. Typically the gradient of the upper-level objective (hypergradient) is hard or even impossible to compute exactly, which has raised the interest in approximation methods. We investigate some popular approaches to compute the hypergradient, based on reverse mode iterative differentiation and approximate implicit differentiation. Under the hypothesis that the fixed point equation is defined by a contraction mapping, we present a unified analysis which allows for the first time to quantitatively compare these methods, providing explicit bounds for their iteration complexity. This analysis suggests a hierarchy in terms of computational efficiency among the above methods, with approximate implicit differentiation based on conjugate gradient performing best. We present an extensive experimental comparison among the methods which confirm the theoretical findings.

التعلم الالي التعلم الآلي

MARTHE: Scheduling the Learning Rate Via Online Hypergradients

71 - Michele Donini , Luca Franceschi , Massimiliano Pontil 2019

We study the problem of fitting task-specific learning rate schedules from the perspective of hyperparameter optimization, aiming at good generalization. We describe the structure of the gradient of a validation error w.r.t. the learning rate schedul e -- the hypergradient. Based on this, we introduce MARTHE, a novel online algorithm guided by cheap approximations of the hypergradient that uses past information from the optimization trajectory to simulate future behaviour. It interpolates between two recent techniques, RTHO (Franceschi et al., 2017) and HD (Baydin et al. 2018), and is able to produce learning rate schedules that are more stable leading to models that generalize better.

التعلم الآلي التعلم الالي

Learning Discrete Structures for Graph Neural Networks

228 - Luca Franceschi , Mathias Niepert , Massimiliano Pontil 2019

Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-s tructure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin.

التعلم الآلي التعلم الالي

Far-HO: A Bilevel Programming Package for Hyperparameter Optimization and Meta-Learning

217 - Luca Franceschi , Riccardo Grazzi , Massimiliano Pontil 2018

In (Franceschi et al., 2018) we proposed a unified mathematical framework, grounded on bilevel programming, that encompasses gradient-based hyperparameter optimization and meta-learning. We formulated an approximate version of the problem where the i nner objective is solved iteratively, and gave sufficient conditions ensuring convergence to the exact problem. In this work we show how to optimize learning rates, automatically weight the loss of single examples and learn hyper-representations with Far-HO, a software package based on the popular deep learning framework TensorFlow that allows to seamlessly tackle both HO and ML problems.

البرمجيات الرياضية التعلم الآلي التعلم الالي

New Perspectives on k-Support and Cluster Norms

268 - Andrew M. McDonald , Massimiliano Pontil , Dimitris Stamos 2014

The $k$-support norm is a regularizer which has been successfully applied to sparse vector prediction problems. We show that it belongs to a general class of norms which can be formulated as a parameterized infimum over quadratics. We further extend the $k$-support norm to matrices, and we observe that it is a special case of the matrix cluster norm. Using this formulation we derive an efficient algorithm to compute the proximity operator of both norms. This improves upon the standard algorithm for the $k$-support norm and allows us to apply proximal gradient methods to the cluster norm. We also describe how to solve regularization problems which employ center

التعلم الالي

Modeling transition dynamics in MDPs with RKHS embeddings of conditional distributions

135 - Steffen Grunewalder , Luca Baldassarre , Massimiliano Pontil 2011

التعلم الآلي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد