Hybrid Models for Learning to Branch

205 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Prateek Gupta

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Prateek Gupta - Maxime Gasse - Elias B. Khalil

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

A recent Graph Neural Network (GNN) approach for learning to branch has been shown to successfully reduce the running time of branch-and-bound algorithms for Mixed Integer Linear Programming (MILP). While the GNN relies on a GPU for inference, MILP solvers are purely CPU-based. This severely limits its application as many practitioners may not have access to high-end GPUs. In this work, we ask two key questions. First, in a more realistic setting where only a CPU is available, is the GNN model still competitive? Second, can we devise an alternate computationally inexpensive model that retains the predictive power of the GNN architecture? We answer the first question in the negative, and address the second question by proposing a new hybrid architecture for efficient branching on CPU machines. The proposed architecture combines the expressive power of GNNs with computationally inexpensive multi-layer perceptrons (MLP) for branching. We evaluate our methods on four classes of MILP problems, and show that they lead to up to 26% reduction in solver running time compared to state-of-the-art methods without a GPU, while extrapolating to harder problems than it was trained on. The code for this project is publicly available at https://github.com/pg2455/Hybrid-learn2branch.

قيم البحث

180 - Nhan H. Pham , Lam M. Nguyen , Dzung T. Phan 2020

We propose a novel hybrid stochastic policy gradient estimator by combining an unbiased policy gradient estimator, the REINFORCE estimator, with another biased one, an adapted SARAH estimator for policy optimization. The hybrid policy gradient estima tor is shown to be biased, but has variance reduced property. Using this estimator, we develop a new Proximal Hybrid Stochastic Policy Gradient Algorithm (ProxHSPGA) to solve a composite policy optimization problem that allows us to handle constraints or regularizers on the policy parameters. We first propose a single-looped algorithm then introduce a more practical restarting variant. We prove that both algorithms can achieve the best-known trajectory complexity $mathcal{O}left(varepsilon^{-3}right)$ to attain a first-order stationary point for the composite problem which is better than existing REINFORCE/GPOMDP $mathcal{O}left(varepsilon^{-4}right)$ and SVRPG $mathcal{O}left(varepsilon^{-10/3}right)$ in the non-composite setting. We evaluate the performance of our algorithm on several well-known examples in reinforcement learning. Numerical results show that our algorithm outperforms two existing methods on these examples. Moreover, the composite settings indeed have some advantages compared to the non-composite ones on certain problems.

التعلم الآلي التحسين والتحكم

Continuous Online Learning and New Insights to Online Imitation Learning

105 - Jonathan Lee , Ching-An Cheng , Ken Goldberg 2019

Online learning is a powerful tool for analyzing iterative algorithms. However, the classic adversarial setup sometimes fails to capture certain regularity in online problems in practice. Motivated by this, we establish a new setup, called Continuous Online Learning (COL), where the gradient of online loss function changes continuously across rounds with respect to the learners decisions. We show that COL covers and more appropriately describes many interesting applications, from general equilibrium problems (EPs) to optimization in episodic MDPs. Using this new setup, we revisit the difficulty of achieving sublinear dynamic regret. We prove that there is a fundamental equivalence between achieving sublinear dynamic regret in COL and solving certain EPs, and we present a reduction from dynamic regret to both static regret and convergence rate of the associated EP. At the end, we specialize these new insights into online imitation learning and show improved understanding of its learning stability.

التعلم الآلي التحسين والتحكم التعلم الالي

A Hybrid Gradient Method to Designing Bayesian Experiments for Implicit Models

71 - Jiaxin Zhang , Sirui Bi , Guannan Zhang 2021

Bayesian experimental design (BED) aims at designing an experiment to maximize the information gathering from the collected data. The optimal design is usually achieved by maximizing the mutual information (MI) between the data and the model paramete rs. When the analytical expression of the MI is unavailable, e.g., having implicit models with intractable data distributions, a neural network-based lower bound of the MI was recently proposed and a gradient ascent method was used to maximize the lower bound. However, the approach in Kleinegesse et al., 2020 requires a pathwise sampling path to compute the gradient of the MI lower bound with respect to the design variables, and such a pathwise sampling path is usually inaccessible for implicit models. In this work, we propose a hybrid gradient approach that leverages recent advances in variational MI estimator and evolution strategies (ES) combined with black-box stochastic gradient ascent (SGA) to maximize the MI lower bound. This allows the design process to be achieved through a unified scalable procedure for implicit models without sampling path gradients. Several experiments demonstrate that our approach significantly improves the scalability of BED for implicit models in high-dimensional design space.

التعلم الآلي المنهجية التعلم الالي

Tune smarter not harder: A principled approach to tuning learning rates for shallow nets

77 - Thulasi Tholeti , Sheetal Kalyani 2020

Effective hyper-parameter tuning is essential to guarantee the performance that neural networks have come to be known for. In this work, a principled approach to choosing the learning rate is proposed for shallow feedforward neural networks. We assoc iate the learning rate with the gradient Lipschitz constant of the objective to be minimized while training. An upper bound on the mentioned constant is derived and a search algorithm, which always results in non-divergent traces, is proposed to exploit the derived bound. It is shown through simulations that the proposed search method significantly outperforms the existing tuning methods such as Tree Parzen Estimators (TPE). The proposed method is applied to three different existing applications: a) channel estimation in OFDM systems, b) prediction of the exchange currency rates and c) offset estimation in OFDM receivers, and it is shown to pick better learning rates than the existing methods using the same or lesser compute power.

التعلم الآلي التحسين والتحكم التعلم الالي

Self-training Converts Weak Learners to Strong Learners in Mixture Models

117 - Spencer Frei , Difan Zou , Zixiang Chen 2021

We consider a binary classification problem when the data comes from a mixture of two rotationally symmetric distributions satisfying concentration and anti-concentration properties enjoyed by log-concave distributions among others. We show that ther e exists a universal constant $C_{mathrm{err}}>0$ such that if a pseudolabeler $boldsymbol{beta}_{mathrm{pl}}$ can achieve classification error at most $C_{mathrm{err}}$, then for any $varepsilon>0$, an iterative self-training algorithm initialized at $boldsymbol{beta}_0 := boldsymbol{beta}_{mathrm{pl}}$ using pseudolabels $hat y = mathrm{sgn}(langle boldsymbol{beta}_t, mathbf{x}rangle)$ and using at most $tilde O(d/varepsilon^2)$ unlabeled examples suffices to learn the Bayes-optimal classifier up to $varepsilon$ error, where $d$ is the ambient dimension. That is, self-training converts weak learners to strong learners using only unlabeled examples. We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler $boldsymbol{beta}_{mathrm{pl}}$ with classification error $C_{mathrm{err}}$ using only $O(d)$ labeled examples (i.e., independent of $varepsilon$). Together our results imply that mixture models can be learned to within $varepsilon$ of the Bayes-optimal accuracy using at most $O(d)$ labeled examples and $tilde O(d/varepsilon^2)$ unlabeled examples by way of a semi-supervised self-training algorithm.

التعلم الآلي التحسين والتحكم التعلم الالي