A Generic First-Order Algorithmic Framework for Bi-Level Programming Beyond Lower-Level Singleton

145 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Risheng Liu

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Risheng Liu - Pan Mu - Xiaoming Yuan

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In recent years, a variety of gradient-based first-order methods have been developed to solve bi-level optimization problems for learning applications. However, theoretical guarantees of these existing approaches heavily rely on the simplification that for each fixed upper-level variable, the lower-level solution must be a singleton (a.k.a., Lower-Level Singleton, LLS). In this work, we first design a counter-example to illustrate the invalidation of such LLS condition. Then by formulating BLPs from the view point of optimistic bi-level and aggregating hierarchical objective information, we establish Bi-level Descent Aggregation (BDA), a flexible and modularized algorithmic framework for generic bi-level optimization. Theoretically, we derive a new methodology to prove the convergence of BDA without the LLS condition. Our investigations also demonstrate that BDA is indeed compatible to a verify of particular first-order computation modules. Additionally, as an interesting byproduct, we also improve these conventional first-order bi-level schemes (under the LLS simplification). Particularly, we establish their convergences with weaker assumptions. Extensive experiments justify our theoretical results and demonstrate the superiority of the proposed BDA for different tasks, including hyper-parameter optimization and meta learning.

قيم البحث

77 - Risheng Liu , Pan Mu , Xiaoming Yuan 2021

In recent years, gradient-based methods for solving bi-level optimization tasks have drawn a great deal of interest from the machine learning community. However, to calculate the gradient of the best response, existing research always relies on the s ingleton of the lower-level solution set (a.k.a., Lower-Level Singleton, LLS). In this work, by formulating bi-level models from an optimistic bi-level viewpoint, we first establish a novel Bi-level Descent Aggregation (BDA) framework, which aggregates hierarchical objectives of both upper level and lower level. The flexibility of our framework benefits from the embedded replaceable task-tailored iteration dynamics modules, thereby capturing a wide range of bi-level learning tasks. Theoretically, we derive a new methodology to prove the convergence of BDA framework without the LLS restriction. Besides, the new proof recipe we propose is also engaged to improve the convergence results of conventional gradient-based bi-level methods under the LLS simplification. Furthermore, we employ a one-stage technique to accelerate the back-propagation calculation in a numerical manner. Extensive experiments justify our theoretical results and demonstrate the superiority of the proposed algorithm for hyper-parameter optimization and meta-learning tasks.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط النظم الديناميكية

Investigating Bi-Level Optimization for Learning and Vision from a Unified Perspective: A Survey and Beyond

166 - Risheng Liu , Jiaxin Gao , Jin Zhang 2021

Bi-Level Optimization (BLO) is originated from the area of economic game theory and then introduced into the optimization community. BLO is able to handle problems with a hierarchical structure, involving two levels of optimization tasks, where one t ask is nested inside the other. In machine learning and computer vision fields, despite the different motivations and mechanisms, a lot of complex problems, such as hyper-parameter optimization, multi-task and meta-learning, neural architecture search, adversarial learning and deep reinforcement learning, actually all contain a series of closely related subproblms. In this paper, we first uniformly express these complex learning and vision problems from the perspective of BLO. Then we construct a best-response-based single-level reformulation and establish a unified algorithmic framework to understand and formulate mainstream gradient-based BLO methodologies, covering aspects ranging from fundamental automatic differentiation schemes to various accelerations, simplifications, extensions and their convergence and complexity properties. Last but not least, we discuss the potentials of our unified BLO framework for designing new algorithms and point out some promising directions for future research.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط النظم الديناميكية

Debiasing a First-order Heuristic for Approximate Bi-level Optimization

114 - Valerii Likhosherstov , Xingyou Song , Krzysztof Choromanski 2021

Approximate bi-level optimization (ABLO) consists of (outer-level) optimization problems, involving numerical (inner-level) optimization loops. While ABLO has many applications across deep learning, it suffers from time and memory complexity proporti onal to the length $r$ of its inner optimization loop. To address this complexity, an earlier first-order method (FOM) was proposed as a heuristic that omits second derivative terms, yielding significant speed gains and requiring only constant memory. Despite FOMs popularity, there is a lack of theoretical understanding of its convergence properties. We contribute by theoretically characterizing FOMs gradient bias under mild assumptions. We further demonstrate a rich family of examples where FOM-based SGD does not converge to a stationary point of the ABLO objective. We address this concern by proposing an unbiased FOM (UFOM) enjoying constant memory complexity as a function of $r$. We characterize the introduced time-variance tradeoff, demonstrate convergence bounds, and find an optimal UFOM for a given ABLO problem. Finally, we propose an efficient adaptive UFOM scheme.

التعلم الآلي

A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs

92 - Runzhong Wang , Zhigang Hua , Gan Liu 2021

Combinatorial Optimization (CO) has been a long-standing challenging research topic featured by its NP-hard nature. Traditionally such problems are approximately solved with heuristic algorithms which are usually fast but may sacrifice the solution q uality. Currently, machine learning for combinatorial optimization (MLCO) has become a trending research topic, but most existing MLCO methods treat CO as a single-level optimization by directly learning the end-to-end solutions, which are hard to scale up and mostly limited by the capacity of ML models given the high complexity of CO. In this paper, we propose a hybrid approach to combine the best of the two worlds, in which a bi-level framework is developed with an upper-level learning method to optimize the graph (e.g. add, delete or modify edges in a graph), fused with a lower-level heuristic algorithm solving on the optimized graph. Such a bi-level approach simplifies the learning on the original hard CO and can effectively mitigate the demand for model capacity. The experiments and results on several popular CO problems like Directed Acyclic Graph scheduling, Graph Edit Distance and Hamiltonian Cycle Problem show its effectiveness over manually designed heuristics and single-level learning methods.

التعلم الآلي التوافقية

Advantages of a modular high-level quantum programming framework

81 - Damian S. Steiger , Thomas Haner , Matthias Troyer 2018

We review some of the features of the ProjectQ software framework and quantify their impact on the resulting circuits. The concise high-level language facilitates implementing even complex algorithms in a very time-efficient manner while, at the same time, providing the compiler with additional information for optimization through code annotation - so-called meta-instructions. We investigate the impact of these annotations for the example of Shors algorithm in terms of logical gate counts. Furthermore, we analyze the effect of different intermediate gate sets for optimization and how the dimensions of the resulting circuit depend on a smart choice thereof. Finally, we demonstrate the benefits of a modular compilation framework by implementing mapping procedures for one- and two-dimensional nearest neighbor architectures which we then compare in terms of overhead for different problem sizes.

فيزياء الكم التقنيات الناشئة