Gated Linear Networks

196 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jianan Wang

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Joel Veness - Tor Lattimore - David Budden

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper presents a new family of backpropagation-free neural architectures, Gated Linear Networks (GLNs). What distinguishes GLNs from contemporary neural networks is the distributed and local nature of their credit assignment mechanism; each neuron directly predicts the target, forgoing the ability to learn feature representations in favor of rapid online learning. Individual neurons can model nonlinear functions via the use of data-dependent gating in conjunction with online convex optimization. We show that this architecture gives rise to universal learning capabilities in the limit, with effective model capacity increasing as a function of network size in a manner comparable with deep ReLU networks. Furthermore, we demonstrate that the GLN learning mechanism possesses extraordinary resilience to catastrophic forgetting, performing comparably to a MLP with dropout and Elastic Weight Consolidation on standard benchmarks. These desirable theoretical and empirical properties position GLNs as a complementary technique to contemporary offline deep learning methods.

قيم البحث

اقرأ أيضاً

Gaussian Gated Linear Networks

90 - David Budden , Adam Marblestone , Eren Sezener 2020

We propose the Gaussian Gated Linear Network (G-GLN), an extension to the recently proposed GLN family of deep neural networks. Instead of using backpropagation to learn features, GLNs have a distributed and local credit assignment mechanism based on optimizing a convex objective. This gives rise to many desirable properties including universality, data-efficient online learning, trivial interpretability and robustness to catastrophic forgetting. We extend the GLN framework from classification to multiple regression and density modelling by generalizing geometric mixing to a product of Gaussian densities. The G-GLN achieves competitive or state-of-the-art performance on several univariate and multivariate regression benchmarks, and we demonstrate its applicability to practical tasks including online contextual bandits and density estimation via denoising.

التعلم الآلي التعلم الالي

Minimax Lower Bounds for Transfer Learning with Linear and One-hidden Layer Neural Networks

114 - Seyed Mohammadreza Mousavi Kalan , Zalan Fabian , A. Salman Avestimehr 2020

Transfer learning has emerged as a powerful technique for improving the performance of machine learning models on new domains where labeled training data may be scarce. In this approach a model trained for a source task, where plenty of labeled train ing data is available, is used as a starting point for training a model on a related target task with only few labeled training data. Despite recent empirical success of transfer learning approaches, the benefits and fundamental limits of transfer learning are poorly understood. In this paper we develop a statistical minimax framework to characterize the fundamental limits of transfer learning in the context of regression with linear and one-hidden layer neural network models. Specifically, we derive a lower-bound for the target generalization error achievable by any algorithm as a function of the number of labeled source and target data as well as appropriate notions of similarity between the source and target tasks. Our lower bound provides new insights into the benefits and limitations of transfer learning. We further corroborate our theoretical finding with various experiments.

التعلم الآلي نظرية المعلومات نظرية المعلومات

Inference with Multivariate Heavy-Tails in Linear Models

172 - Danny Bickson , Carlos Guestrin 2010

Heavy-tailed distributions naturally occur in many real life problems. Unfortunately, it is typically not possible to compute inference in closed-form in graphical models which involve such heavy-tailed distributions. In this work, we propose a nov el simple linear graphical model for independent latent random variables, called linear characteristic model (LCM), defined in the characteristic function domain. Using stable distributions, a heavy-tailed family of distributions which is a generalization of Cauchy, Levy and Gaussian distributions, we show for the first time, how to compute both exact and approximate inference in such a linear multivariate graphical model. LCMs are not limited to stable distributions, in fact LCMs are always defined for any random variables (discrete, continuous or a mixture of both). We provide a realistic problem from the field of computer networks to demonstrate the applicability of our construction. Other potential application is iterative decoding of linear channels with non-Gaussian noise.

التعلم الآلي نظرية المعلومات نظرية المعلومات

Towards Minimax Optimal Best Arm Identification in Linear Bandits

146 - Junwen Yang , Vincent Y. F. Tan 2021

We study the problem of best arm identification in linear bandits in the fixed-budget setting. By leveraging properties of the G-optimal design and incorporating it into the arm allocation rule, we design a parameter-free algorithm, Optimal Design-ba sed Linear Best Arm Identification (OD-LinBAI). We provide a theoretical analysis of the failure probability of OD-LinBAI. While the performances of existing methods (e.g., BayesGap) depend on all the optimality gaps, OD-LinBAI depends on the gaps of the top $d$ arms, where $d$ is the effective dimension of the linear bandit instance. Furthermore, we present a minimax lower bound for this problem. The upper and lower bounds show that OD-LinBAI is minimax optimal up to multiplicative factors in the exponent. Finally, numerical experiments corroborate our theoretical findings.

التعلم الآلي نظرية المعلومات نظرية المعلومات

HpGAN: Sequence Search with Generative Adversarial Networks

100 - Mingxing Zhang , Zhengchun Zhou , Lanping Li 2020

Sequences play an important role in many engineering applications and systems. Searching sequences with desired properties has long been an interesting but also challenging research topic. This article proposes a novel method, called HpGAN, to search desired sequences algorithmically using generative adversarial networks (GAN). HpGAN is based on the idea of zero-sum game to train a generative model, which can generate sequences with characteristics similar to the training sequences. In HpGAN, we design the Hopfield network as an encoder to avoid the limitations of GAN in generating discrete data. Compared with traditional sequence construction by algebraic tools, HpGAN is particularly suitable for intractable problems with complex objectives which prevent mathematical analysis. We demonstrate the search capabilities of HpGAN in two applications: 1) HpGAN successfully found many different mutually orthogonal complementary code sets (MOCCS) and optimal odd-length Z-complementary pairs (OB-ZCPs) which are not part of the training set. In the literature, both MOCSSs and OB-ZCPs have found wide applications in wireless communications. 2) HpGAN found new sequences which achieve four-times increase of signal-to-interference ratio--benchmarked against the well-known Legendre sequence--of a mismatched filter (MMF) estimator in pulse compression radar systems. These sequences outperform those found by AlphaSeq.

التعلم الآلي نظرية المعلومات نظرية المعلومات