Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

New Tricks for Estimating Gradients of Expectations

108 0 0.0 ( 0 )

Download Cite

Added by Christian Walder Dr

Publication date 2019

fields Informatics Engineering Mathematical Statistics

and research's language is English

Authors Christian J. Walder - Paul Roussel - Richard Nock

Machine Learning Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We derive a family of Monte Carlo estimators for gradients of expectations which is related to the log-derivative trick, but involves pairwise interactions between samples. The first of these comes from either a) introducing and approximating an integral representation based on the fundamental theorem of calculus, or b) applying the reparameterisation trick to an implicit parameterisation under infinitesimal perturbation of the parameters. From the former perspective we generalise to a reproducing kernel Hilbert space representation, giving rise to locality parameter in the pairwise interactions mentioned above. The resulting estimators are unbiased and shown to offer an independent component of useful information in comparison with the log-derivative estimator. Promising analytical and numerical examples confirm the intuitions behind the new estimators.

rate research

Old Dog Learns New Tricks: Randomized UCB for Bandit Problems

121 - Sharan Vaswani , Abbas Mehrabian , Audrey Durand 2019

We propose $tt RandUCB$, a bandit strategy that builds on theoretically derived confidence intervals similar to upper confidence bound (UCB) algorithms, but akin to Thompson sampling (TS), it uses randomization to trade off exploration and exploitation. In the $K$-armed bandit setting, we show that there are infinitely many variants of $tt RandUCB$, all of which achieve the minimax-optimal $widetilde{O}(sqrt{K T})$ regret after $T$ rounds. Moreover, for a specific multi-armed bandit setting, we show that both UCB and TS can be recovered as special cases of $tt RandUCB$. For structured bandits, where each arm is associated with a $d$-dimensional feature vector and rewards are distributed according to a linear or generalized linear model, we prove that $tt RandUCB$ achieves the minimax-optimal $widetilde{O}(d sqrt{T})$ regret even in the case of infinitely many arms. Through experiments in both the multi-armed and structured bandit settings, we demonstrate that $tt RandUCB$ matches or outperforms TS and other randomized exploration strategies. Our theoretical and empirical results together imply that $tt RandUCB$ achieves the best of both worlds.

Machine Learning Machine Learning

Estimating Gradients for Discrete Random Variables by Sampling without Replacement

150 - Wouter Kool , Herke van Hoof , Max Welling 2020

We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement, which reduces variance as it avoids duplicate samples. We show that our estimator can be derived as the Rao-Blackwellization of three different estimators. Combining our estimator with REINFORCE, we obtain a policy gradient estimator and we reduce its variance using a built-in control variate which is obtained without additional model evaluations. The resulting estimator is closely related to other gradient estimators. Experiments with a toy problem, a categorical Variational Auto-Encoder and a structured prediction problem show that our estimator is the only estimator that is consistently among the best estimators in both high and low entropy settings.

Machine Learning Machine Learning

Bag of Tricks for Adversarial Training

134 - Tianyu Pang , Xiao Yang , Yinpeng Dong 2020

Adversarial training (AT) is one of the most effective strategies for promoting model robustness. However, recent benchmarks show that most of the proposed improvements on AT are less effective than simply early stopping the training procedure. This counter-intuitive fact motivates us to investigate the implementation details of tens of AT methods. Surprisingly, we find that the basic settings (e.g., weight decay, training schedule, etc.) used in these methods are highly inconsistent. In this work, we provide comprehensive evaluations on CIFAR-10, focusing on the effects of mostly overlooked training tricks and hyperparameters for adversarially trained models. Our empirical observations suggest that adversarial robustness is much more sensitive to some basic training settings than we thought. For example, a slightly different value of weight decay can reduce the model robust accuracy by more than 7%, which is probable to override the potential promotion induced by the proposed methods. We conclude a baseline training setting and re-implement previous defenses to achieve new state-of-the-art results. These facts also appeal to more concerns on the overlooked confounders when benchmarking defenses.

Machine Learning Computer Vision and Pattern Recognition Machine Learning

Estimating Q(s,s) with Deep Deterministic Dynamics Gradients

62 - Ashley D. Edwards , Himanshu Sahni , Rosanne Liu 2020

In this paper, we introduce a novel form of value function, $Q(s, s)$, that expresses the utility of transitioning from a state $s$ to a neighboring state $s$ and then acting optimally thereafter. In order to derive an optimal policy, we develop a forward dynamics model that learns to make next-state predictions that maximize this value. This formulation decouples actions from values while still learning off-policy. We highlight the benefits of this approach in terms of value function transfer, learning within redundant action spaces, and learning off-policy from state observations generated by sub-optimal or completely random policies. Code and videos are available at http://sites.google.com/view/qss-paper.

Machine Learning Artificial Intelligence Machine Learning

WaveGrad: Estimating Gradients for Waveform Generation

161 - Nanxin Chen , Yu Zhang , Heiga Zen 2020

This paper introduces WaveGrad, a conditional model for waveform generation which estimates gradients of the data density. The model is built on prior work on score matching and diffusion probabilistic models. It starts from a Gaussian white noise signal and iteratively refines the signal via a gradient-based sampler conditioned on the mel-spectrogram. WaveGrad offers a natural way to trade inference speed for sample quality by adjusting the number of refinement steps, and bridges the gap between non-autoregressive and autoregressive models in terms of audio quality. We find that it can generate high fidelity audio samples using as few as six iterations. Experiments reveal WaveGrad to generate high fidelity audio, outperforming adversarial non-autoregressive baselines and matching a strong likelihood-based autoregressive baseline using fewer sequential operations. Audio samples are available at https://wavegrad.github.io/.

Audio and Speech Processing Machine Learning Sound

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

New Tricks for Estimating Gradients of Expectations

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions