Nonsmooth Implicit Differentiation for Machine Learning and Optimization

247 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Edouard Pauwels

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jer^ome Bolte

التعلم الآلي التحسين والتحكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In view of training increasingly complex learning architectures, we establish a nonsmooth implicit function theorem with an operational calculus. Our result applies to most practical problems (i.e., definable problems) provided that a nonsmooth form of the classical invertibility condition is fulfilled. This approach allows for formal subdifferentiation: for instance, replacing derivatives by Clarke Jacobians in the usual differentiation formulas is fully justified for a wide class of nonsmooth problems. Moreover this calculus is entirely compatible with algorithmic differentiation (e.g., backpropagation). We provide several applications such as training deep equilibrium networks, training neural nets with conic optimization layers, or hyperparameter-tuning for nonsmooth Lasso-type models. To show the sharpness of our assumptions, we present numerical experiments showcasing the extremely pathological gradient dynamics one can encounter when applying implicit algorithmic differentiation without any hypothesis.

قيم البحث

102 - Kaiyi Ji 2021

Bilevel optimization has become a powerful framework in various machine learning applications including meta-learning, hyperparameter optimization, and network architecture search. There are generally two classes of bilevel optimization formulations for machine learning: 1) problem-based bilevel optimization, whose inner-level problem is formulated as finding a minimizer of a given loss function; and 2) algorithm-based bilevel optimization, whose inner-level solution is an output of a fixed algorithm. For the first class, two popular types of gradient-based algorithms have been proposed for hypergradient estimation via approximate implicit differentiation (AID) and iterative differentiation (ITD). Algorithms for the second class include the popular model-agnostic meta-learning (MAML) and almost no inner loop (ANIL). However, the convergence rate and fundamental limitations of bilevel optimization algorithms have not been well explored. This thesis provides a comprehensive convergence rate analysis for bilevel algorithms in the aforementioned two classes. We further propose principled algorithm designs for bilevel optimization with higher efficiency and scalability. For the problem-based formulation, we provide a convergence rate analysis for AID- and ITD-based bilevel algorithms. We then develop acceleration bilevel algorithms, for which we provide shaper convergence analysis with relaxed assumptions. We also provide the first lower bounds for bilevel optimization, and establish the optimality by providing matching upper bounds under certain conditions. We finally propose new stochastic bilevel optimization algorithms with lower complexity and higher efficiency in practice. For the algorithm-based formulation, we develop a theoretical convergence for general multi-step MAML and ANIL, and characterize the impact of parameter selections and loss geometries on the their complexities.

التعلم الآلي التحسين والتحكم التعلم الالي

Implicit Deep Learning

95 - Laurent El Ghaoui , Fangda Gu , Bertrand Travacca 2019

Implicit deep learning prediction rules generalize the recursive rules of feedforward neural networks. Such rules are based on the solution of a fixed-point equation involving a single vector of hidden features, which is thus only implicitly defined. The implicit framework greatly simplifies the notation of deep learning, and opens up many new possibilities, in terms of novel architectures and algorithms, robustness analysis and design, interpretability, sparsity, and network architecture optimization.

التعلم الآلي التحسين والتحكم التعلم الالي

Ecole: A Gym-like Library for Machine Learning in Combinatorial Optimization Solvers

75 - Antoine Prouvost , Justin Dumouchelle , Lara Scavuzzo 2020

We present Ecole, a new library to simplify machine learning research for combinatorial optimization. Ecole exposes several key decision tasks arising in general-purpose combinatorial optimization solvers as control problems over Markov decision proc esses. Its interface mimics the popular OpenAI Gym library and is both extensible and intuitive to use. We aim at making this library a standardized platform that will lower the bar of entry and accelerate innovation in the field. Documentation and code can be found at https://www.ecole.ai.

التعلم الآلي التحسين والتحكم

Optimization under Uncertainty in the Era of Big Data and Deep Learning: When Machine Learning Meets Mathematical Programming

72 - Chao Ning , Fengqi You 2019

This paper reviews recent advances in the field of optimization under uncertainty via a modern data lens, highlights key research challenges and promise of data-driven optimization that organically integrates machine learning and mathematical program ming for decision-making under uncertainty, and identifies potential research opportunities. A brief review of classical mathematical programming techniques for hedging against uncertainty is first presented, along with their wide spectrum of applications in Process Systems Engineering. A comprehensive review and classification of the relevant publications on data-driven distributionally robust optimization, data-driven chance constrained program, data-driven robust optimization, and data-driven scenario-based optimization is then presented. This paper also identifies fertile avenues for future research that focuses on a closed-loop data-driven optimization framework, which allows the feedback from mathematical programming to machine learning, as well as scenario-based optimization leveraging the power of deep learning techniques. Perspectives on online learning-based data-driven multistage optimization with a learning-while-optimizing scheme is presented.

التعلم الآلي التحسين والتحكم التعلم الالي

Demystifying Batch Normalization in ReLU Networks: Equivalent Convex Optimization Models and Implicit Regularization

103 - Tolga Ergen , Arda Sahiner , Batu Ozturkler 2021

Batch Normalization (BN) is a commonly used technique to accelerate and stabilize training of deep neural networks. Despite its empirical success, a full theoretical understanding of BN is yet to be developed. In this work, we analyze BN through the lens of convex optimization. We introduce an analytic framework based on convex duality to obtain exact convex representations of weight-decay regularized ReLU networks with BN, which can be trained in polynomial-time. Our analyses also show that optimal layer weights can be obtained as simple closed-form formulas in the high-dimensional and/or overparameterized regimes. Furthermore, we find that Gradient Descent provides an algorithmic bias effect on the standard non-convex BN network, and we design an approach to explicitly encode this implicit regularization into the convex objective. Experiments with CIFAR image classification highlight the effectiveness of this explicit regularization for mimicking and substantially improving the performance of standard BN networks.

التعلم الآلي التحسين والتحكم التعلم الالي