Procrastinating with Confidence: Near-Optimal, Anytime, Adaptive Algorithm Configuration

145 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Devon Graham Mr

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Robert Kleinberg - Kevin Leyton-Brown - Brendan Lucier

الذكاء الاصطناعي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Algorithm configuration methods optimize the performance of a parameterized heuristic algorithm on a given distribution of problem instances. Recent work introduced an algorithm configuration procedure (Structured Procrastination) that provably achieves near optimal performance with high probability and with nearly minimal runtime in the worst case. It also offers an $textit{anytime}$ property: it keeps tightening its optimality guarantees the longer it is run. Unfortunately, Structured Procrastination is not $textit{adaptive}$ to characteristics of the parameterized algorithm: it treats every input like the worst case. Follow-up work (LeapsAndBounds) achieves adaptivity but trades away the anytime property. This paper introduces a new algorithm, Structured Procrastination with Confidence, that preserves the near-optimality and anytime properties of Structured Procrastination while adding adaptivity. In particular, the new algorithm will perform dramatically faster in settings where many algorithm configurations perform poorly. We show empirically both that such settings arise frequently in practice and that the anytime property is useful for finding good configurations quickly.

قيم البحث

120 - Gyuwan Kim , Kyunghyun Cho 2020

Despite transformers impressive accuracy, their computational cost is often prohibitive to use with limited computational resources. Most previous approaches to improve inference efficiency require a separate model for each possible computational bud get. In this paper, we extend PoWER-BERT (Goyal et al., 2020) and propose Length-Adaptive Transformer that can be used for various inference scenarios after one-shot training. We train a transformer with LengthDrop, a structural variant of dropout, which stochastically determines a sequence length at each layer. We then conduct a multi-objective evolutionary search to find a length configuration that maximizes the accuracy and minimizes the efficiency metric under any given computational budget. Additionally, we significantly extend the applicability of PoWER-BERT beyond sequence-level classification into token-level classification with Drop-and-Restore process that drops word-vectors temporarily in intermediate layers and restores at the last layer if necessary. We empirically verify the utility of the proposed approach by demonstrating the superior accuracy-efficiency trade-off under various setups, including span-based question answering and text classification. Code is available at https://github.com/clovaai/length-adaptive-transformer.

الحساب واللغة التعلم الآلي

Near-optimal inference in adaptive linear regression

339 - Koulik Khamaru , Yash Deshpande , Lester Mackey 2021

When data is collected in an adaptive manner, even simple methods like ordinary least squares can exhibit non-normal asymptotic behavior. As an undesirable consequence, hypothesis tests and confidence intervals based on asymptotic normality can lead to erroneous results. We propose an online debiasing estimator to correct these distributional anomalies in least squares estimation. Our proposed method takes advantage of the covariance structure present in the dataset and provides sharper estimates in directions for which more information has accrued. We establish an asymptotic normality property for our proposed online debiasing estimator under mild conditions on the data collection process, and provide asymptotically exact confidence intervals. We additionally prove a minimax lower bound for the adaptive linear regression problem, thereby providing a baseline by which to compare estimators. There are various conditions under which our proposed estimator achieves the minimax lower bound up to logarithmic factors. We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.

نظرية الإحصاء التعلم الآلي التعلم الالي

Optimal Best Arm Identification with Fixed Confidence

128 - Aurelien Garivier 2016

We give a complete characterization of the complexity of best-arm identification in one-parameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the `Track-and-Stop strategy, which we prove to be asymptoticall y optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stopping rule named after Chernoff, for which we give a new analysis.

نظرية الإحصاء التعلم الآلي التعلم الالي

DACBench: A Benchmark Library for Dynamic Algorithm Configuration

60 - Theresa Eimer , Andre Biedenkapp , Maximilian Reimer 2021

Dynamic Algorithm Configuration (DAC) aims to dynamically control a target algorithms hyperparameters in order to improve its performance. Several theoretical and empirical results have demonstrated the benefits of dynamically controlling hyperparame ters in domains like evolutionary computation, AI Planning or deep learning. Replicating these results, as well as studying new methods for DAC, however, is difficult since existing benchmarks are often specialized and incompatible with the same interfaces. To facilitate benchmarking and thus research on DAC, we propose DACBench, a benchmark library that seeks to collect and standardize existing DAC benchmarks from different AI domains, as well as provide a template for new ones. For the design of DACBench, we focused on important desiderata, such as (i) flexibility, (ii) reproducibility, (iii) extensibility and (iv) automatic documentation and visualization. To show the potential, broad applicability and challenges of DAC, we explore how a set of six initial benchmarks compare in several dimensions of difficulty.

الذكاء الاصطناعي

Adaptive Confidence Sets for the Optimal Approximating Model

241 - Angelika Rohde , Lutz Duembgen 2009

In the setting of high-dimensional linear models with Gaussian noise, we investigate the possibility of confidence statements connected to model selection. Although there exist numerous procedures for adaptive point estimation, the construction of ad aptive confidence regions is severely limited (cf. Li, 1989). The present paper sheds new light on this gap. We develop exact and adaptive confidence sets for the best approximating model in terms of risk. One of our constructions is based on a multiscale procedure and a particular coupling argument. Utilizing exponential inequalities for noncentral chi-squared distributions, we show that the risk and quadratic loss of all models within our confidence region are uniformly bounded by the minimal risk times a factor close to one.

نظرية الإحصاء المنهجية نظرية الإحصاء