Optimizing Discrete Spaces via Expensive Evaluations: A Learning to Search Framework

324   0   0.0 ( 0 )
 نشر من قبل Aryan Deshwal
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We consider the problem of optimizing expensive black-box functions over discrete spaces (e.g., sets, sequences, graphs). The key challenge is to select a sequence of combinatorial structures to evaluate, in order to identify high-performing structures as quickly as possible. Our main contribution is to introduce and evaluate a new learning-to-search framework for this problem called L2S-DISCO. The key insight is to employ search procedures guided by control knowledge at each step to select the next structure and to improve the control knowledge as new function evaluations are observed. We provide a concrete instantiation of L2S-DISCO for local search procedure and empirically evaluate it on diverse real-world benchmarks. Results show the efficacy of L2S-DISCO over state-of-the-art algorithms in solving complex optimization problems.



قيم البحث

اقرأ أيضاً

251 - Sungsu Lim , Ajin Joseph , Lei Le 2018
Q-learning can be difficult to use in continuous action spaces, because an optimization has to be solved to find the maximal action for the action-values. A common strategy has been to restrict the functional form of the action-values to be concave i n the actions, to simplify the optimization. Such restrictions, however, can prevent learning accurate action-values. In this work, we propose a new policy search objective that facilitates using Q-learning and a framework to optimize this objective, called Actor-Expert. The Expert uses Q-learning to update the action-values towards optimal action-values. The Actor learns the maximal actions over time for these changing action-values. We develop a Cross Entropy Method (CEM) for the Actor, where such a global optimization approach facilitates use of generically parameterized action-values. This method - which we call Conditional CEM - iteratively concentrates density around maximal actions, conditioned on state. We prove that this algorithm tracks the expected CEM update, over states with changing action-values. We demonstrate in a toy environment that previous methods that restrict the action-value parameterization fail whereas Actor-Expert with a more general action-value parameterization succeeds. Finally, we demonstrate that Actor-Expert performs as well as or better than competitors on four benchmark continuous-action environments.
Machine learning has shown potential for optimizing existing molecules with more desirable properties, a critical step towards accelerating new chemical discovery. In this work, we propose QMO, a generic query-based molecule optimization framework th at exploits latent embeddings from a molecule autoencoder. QMO improves the desired properties of an input molecule based on efficient queries, guided by a set of molecular property predictions and evaluation metrics. We show that QMO outperforms existing methods in the benchmark tasks of optimizing molecules for drug likeliness and solubility under similarity constraints. We also demonstrate significant property improvement using QMO on two new and challenging tasks that are also important in real-world discovery problems: (i) optimizing existing SARS-CoV-2 Main Protease inhibitors toward higher binding affinity; and (ii) improving known antimicrobial peptides towards lower toxicity. Results from QMO show high consistency with external validations, suggesting effective means of facilitating molecule optimization problems with design constraints.
Robust Policy Search is the problem of learning policies that do not degrade in performance when subject to unseen environment model parameters. It is particularly relevant for transferring policies learned in a simulation environment to the real wor ld. Several existing approaches involve sampling large batches of trajectories which reflect the differences in various possible environments, and then selecting some subset of these to learn robust policies, such as the ones that result in the worst performance. We propose an active learning based framework, EffAcTS, to selectively choose model parameters for this purpose so as to collect only as much data as necessary to select such a subset. We apply this framework to an existing method, namely EPOpt, and experimentally validate the gains in sample efficiency and the performance of our approach on standard continuous control tasks. We also present a Multi-Task Learning perspective to the problem of Robust Policy Search, and draw connections from our proposed framework to existing work on Multi-Task Learning.
The promise of machine learning has been explored in a variety of scientific disciplines in the last few years, however, its application on first-principles based computationally expensive tools is still in nascent stage. Even with the advances in co mputational resources and power, transient simulations of large-scale dynamic systems using a variety of the first-principles based computational tools are still limited. In this work, we propose an ensemble approach where we combine one such computationally expensive tool, called discrete element method (DEM), with a time-series forecasting method called auto-regressive integrated moving average (ARIMA) and machine-learning methods to significantly reduce the computational burden while retaining model accuracy and performance. The developed machine-learning model shows good predictability and agreement with the literature, demonstrating its tremendous potential in scientific computing.
Predicting clinical outcome is remarkably important but challenging. Research efforts have been paid on seeking significant biomarkers associated with the therapy response or/and patient survival. However, these biomarkers are generally costly and in vasive, and possibly dissatifactory for novel therapy. On the other hand, multi-modal, heterogeneous, unaligned temporal data is continuously generated in clinical practice. This paper aims at a unified deep learning approach to predict patient prognosis and therapy response, with easily accessible data, e.g., radiographics, laboratory and clinical information. Prior arts focus on modeling single data modality, or ignore the temporal changes. Importantly, the clinical time series is asynchronous in practice, i.e., recorded with irregular intervals. In this study, we formalize the prognosis modeling as a multi-modal asynchronous time series classification task, and propose a MIA-Prognosis framework with Measurement, Intervention and Assessment (MIA) information to predict therapy response, where a Simple Temporal Attention (SimTA) module is developed to process the asynchronous time series. Experiments on synthetic dataset validate the superiory of SimTA over standard RNN-based approaches. Furthermore, we experiment the proposed method on an in-house, retrospective dataset of real-world non-small cell lung cancer patients under anti-PD-1 immunotherapy. The proposed method achieves promising performance on predicting the immunotherapy response. Notably, our predictive model could further stratify low-risk and high-risk patients in terms of long-term survival.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا