Design by adaptive sampling

163 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل David Brookes

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية علم الأحياء

والبحث باللغة English

تأليف David H. Brookes - Jennifer Listgarten

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We present a probabilistic modeling framework and adaptive sampling algorithm wherein unsupervised generative models are combined with black box predictive models to tackle the problem of input design. In input design, one is given one or more stochastic oracle predictive functions, each of which maps from the input design space (e.g. DNA sequences or images) to a distribution over a property of interest (e.g. protein fluorescence or image content). Given such stochastic oracles, the problem is to find an input that is expected to maximize one or more properties, or to achieve a specified value of one or more properties, or any combination thereof. We demonstrate experimentally that our approach substantially outperforms other recently presented methods for tackling a specific version of this problem, namely, maximization when the oracle is assumed to be deterministic and unbiased. We also demonstrate that our method can tackle more gener

قيم البحث

78 - David H. Brookes , Hahnbeom Park , Jennifer Listgarten 2019

We present a new method for design problems wherein the goal is to maximize or specify the value of one or more properties of interest. For example, in protein design, one may wish to find the protein sequence that maximizes fluorescence. We assume a ccess to one or more, potentially black box, stochastic oracle predictive functions, each of which maps from input (e.g., protein sequences) design space to a distribution over a property of interest (e.g. protein fluorescence). At first glance, this problem can be framed as one of optimizing the oracle(s) with respect to the input. However, many state-of-the-art predictive models, such as neural networks, are known to suffer from pathologies, especially for data far from the training distribution. Thus we need to modulate the optimization of the oracle inputs with prior knowledge about what makes `realistic inputs (e.g., proteins that stably fold). Herein, we propose a new method to solve this problem, Conditioning by Adaptive Sampling, which yields state-of-the-art results on a protein fluorescence problem, as compared to other recently published approaches. Formally, our method achieves its success by using model-based adaptive sampling to estimate the conditional distribution of the input sequences given the desired properties.

التعلم الآلي التعلم الالي

Autofocused oracles for model-based design

64 - Clara Fannjiang , Jennifer Listgarten 2020

Data-driven design is making headway into a number of application areas, including protein, small-molecule, and materials engineering. The design goal is to construct an object with desired properties, such as a protein that binds to a therapeutic ta rget, or a superconducting material with a higher critical temperature than previously observed. To that end, costly experimental measurements are being replaced with calls to high-capacity regression models trained on labeled data, which can be leveraged in an in silico search for design candidates. However, the design goal necessitates moving into regions of the design space beyond where such models were trained. Therefore, one can ask: should the regression model be altered as the design algorithm explores the design space, in the absence of new data? Herein, we answer this question in the affirmative. In particular, we (i) formalize the data-driven design problem as a non-zero-sum game, (ii) develop a principled strategy for retraining the regression model as the design algorithm proceeds---what we refer to as autofocusing, and (iii) demonstrate the promise of autofocusing empirically.

التعلم الآلي الأساليب الكمية التعلم الالي

Adaptive Importance Sampling for Finite-Sum Optimization and Sampling with Decreasing Step-Sizes

124 - Ayoub El Hanchi , David A. Stephens 2021

Reducing the variance of the gradient estimator is known to improve the convergence rate of stochastic gradient-based optimization and sampling algorithms. One way of achieving variance reduction is to design importance sampling strategies. Recently, the problem of designing such schemes was formulated as an online learning problem with bandit feedback, and algorithms with sub-linear static regret were designed. In this work, we build on this framework and propose Avare, a simple and efficient algorithm for adaptive importance sampling for finite-sum optimization and sampling with decreasing step-sizes. Under standard technical conditions, we show that Avare achieves $mathcal{O}(T^{2/3})$ and $mathcal{O}(T^{5/6})$ dynamic regret for SGD and SGLD respectively when run with $mathcal{O}(1/t)$ step sizes. We achieve this dynamic regret bound by leveraging our knowledge of the dynamics defined by the algorithm, and combining ideas from online learning and variance-reduced stochastic optimization. We validate empirically the performance of our algorithm and identify settings in which it leads to significant improvements.

التعلم الآلي التحسين والتحكم التعلم الالي

Layered Adaptive Importance Sampling

491 - L. Martino , V. Elvira , D. Luengo 2015

Monte Carlo methods represent the de facto standard for approximating complicated integrals involving multidimensional target distributions. In order to generate random realizations from the target distribution, Monte Carlo techniques use simpler pro posal probability densities to draw candidate samples. The performance of any such method is strictly related to the specification of the proposal distribution, such that unfortunate choices easily wreak havoc on the resulting estimators. In this work, we introduce a layered (i.e., hierarchical) procedure to generate samples employed within a Monte Carlo scheme. This approach ensures that an appropriate equivalent proposal density is always obtained automatically (thus eliminating the risk of a catastrophic performance), although at the expense of a moderate increase in the complexity. Furthermore, we provide a general unified importance sampling (IS) framework, where multiple proposal densities are employed and several IS schemes are introduced by applying the so-called deterministic mixture approach. Finally, given these schemes, we also propose a novel class of adaptive importance samplers using a population of proposals, where the adaptation is driven by independent parallel or interacting Markov Chain Monte Carlo (MCMC) chains. The resulting algorithms efficiently combine the benefits of both IS and MCMC methods.

حساب التعلم الآلي التعلم الالي

Advance Prediction of Ventricular Tachyarrhythmias using Patient Metadata and Multi-Task Networks

86 - Marek Rei , Joshua Oppenheimer , Marek Sirendi 2018

We describe a novel neural network architecture for the prediction of ventricular tachyarrhythmias. The model receives input features that capture the change in RR intervals and ectopic beats, along with features based on heart rate variability and f requency analysis. Patient age is also included as a trainable embedding, while the whole network is optimized with multi-task objectives. Each of these modifications provides a consistent improvement to the model performance, achieving 74.02% prediction accuracy and 77.22% specificity 60 seconds in advance of the episode.

التعلم الآلي الأساليب الكمية التعلم الالي