Natural Wake-Sleep Algorithm

191 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Csongor V\\'arady

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Csongor Varady

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The benefits of using the natural gradient are well known in a wide range of optimization problems. However, for the training of common neural networks the resulting increase in computational complexity sets a limitation to its practical application. Helmholtz Machines are a particular type of generative model composed of two Sigmoid Belief Networks (SBNs), acting as an encoder and a decoder, commonly trained using the Wake-Sleep (WS) algorithm and its reweighted version RWS. For SBNs, it has been shown how the locality of the connections in the graphical structure induces sparsity in the Fisher information matrix. The resulting block diagonal structure can be efficiently exploited to reduce the computational complexity of the Fisher matrix inversion and thus compute the natural gradient exactly, without the need of approximations. We present a geometric adaptation of well-known methods from the literature, introducing the Natural Wake-Sleep (NWS) and the Natural Reweighted Wake-Sleep (NRWS) algorithms. We present an experimental analysis of the novel geometrical algorithms based on the convergence speed and the value of the log-likelihood, both with respect to the number of iterations and the time complexity and demonstrating improvements on these aspects over their respective non-geometric baselines.

قيم البحث

اقرأ أيضاً

Reweighted Wake-Sleep

123 - Jorg Bornschein , Yoshua Bengio 2014

Training deep directed graphical models with many hidden variables and performing inference remains a major challenge. Helmholtz machines and deep belief networks are such models, and the wake-sleep algorithm has been proposed to train them. The wake -sleep algorithm relies on training not just the directed generative model but also a conditional generative model (the inference network) that runs backward from visible to latent, estimating the posterior distribution of latent given visible. We propose a novel interpretation of the wake-sleep algorithm which suggests that better estimators of the gradient can be obtained by sampling latent variables multiple times from the inference network. This view is based on importance sampling as an estimator of the likelihood, with the approximate inference network as a proposal distribution. This interpretation is confirmed experimentally, showing that better likelihood can be achieved with this reweighted wake-sleep procedure. Based on this interpretation, we propose that a sigmoidal belief network is not sufficiently powerful for the layers of the inference network in order to recover a good estimator of the posterior distribution of latent variables. Our experiments show that using a more powerful layer model, such as NADE, yields substantially better generative models.

التعلم الآلي

Low-Power Status Updates via Sleep-Wake Scheduling

92 - Ahmed M. Bedewy , Yin Sun , Rahul Singh 2021

We consider the problem of optimizing the freshness of status updates that are sent from a large number of low-power sources to a common access point. The source nodes utilize carrier sensing to reduce collisions and adopt an asynchronized sleep-wake scheduling strategy to achieve a target network lifetime (e.g., 10 years). We use age of information (AoI) to measure the freshness of status updates, and design sleep-wake parameters for minimizing the weighted-sum peak AoI of the sources, subject to per-source battery lifetime constraints. When the sensing time (i.e., the time duration of carrier sensing) is zero, this sleep-wake design problem can be solved by resorting to a two-layer nested convex optimization procedure; however, for positive sensing times, the problem is non-convex. We devise a low-complexity solution to solve this problem and prove that, for practical sensing times that are short, the solution is within a small gap from the optimum AoI performance. When the mean transmission time of status-update packets is unknown, we devise a reinforcement learning algorithm that adaptively performs the following two tasks in an ``efficient way: a) it learns the unknown parameter, b) it also generates efficient controls that make channel access decisions. We analyze its performance by quantifying its ``regret, i.e., the sub-optimality gap between its average performance and the average performance of a controller that knows the mean transmission time. Our numerical and NS-3 simulation results show that our solution can indeed elongate the batteries lifetime of information sources, while providing a competitive AoI performance.

نظرية المعلومات نظرية المعلومات

Deep Transfer Learning for Single-Channel Automatic Sleep Staging with Channel Mismatch

182 - Huy Phan , Oliver Y. Chen , Philipp Koch 2019

Many sleep studies suffer from the problem of insufficient data to fully utilize deep neural networks as different labs use different recordings set ups, leading to the need of training automated algorithms on rather small databases, whereas large an notated databases are around but cannot be directly included into these studies for data compensation due to channel mismatch. This work presents a deep transfer learning approach to overcome the channel mismatch problem and transfer knowledge from a large dataset to a small cohort to study automatic sleep staging with single-channel input. We employ the state-of-the-art SeqSleepNet and train the network in the source domain, i.e. the large dataset. Afterwards, the pretrained network is finetuned in the target domain, i.e. the small cohort, to complete knowledge transfer. We study two transfer learning scenarios with slight and heavy channel mismatch between the source and target domains. We also investigate whether, and if so, how finetuning entirely or partially the pretrained network would affect the performance of sleep staging on the target domain. Using the Montreal Archive of Sleep Studies (MASS) database consisting of 200 subjects as the source domain and the Sleep-EDF Expanded database consisting of 20 subjects as the target domain in this study, our experimental results show significant performance improvement on sleep staging achieved with the proposed deep transfer learning approach. Furthermore, these results also reveal the essential of finetuning the feature-learning parts of the pretrained network to be able to bypass the channel mismatch problem.

التعلم الآلي التعلم الالي

Hybrid Memoised Wake-Sleep: Approximate Inference at the Discrete-Continuous Interface

86 - Tuan Anh Le , Katherine M. Collins , Luke Hewitt 2021

Modeling complex phenomena typically involves the use of both discrete and continuous variables. Such a setting applies across a wide range of problems, from identifying trends in time-series data to performing effective compositional scene understan ding in images. Here, we propose Hybrid Memoised Wake-Sleep (HMWS), an algorithm for effective inference in such hybrid discrete-continuous models. Prior approaches to learning suffer as they need to perform repeated expensive inner-loop discrete inference. We build on a recent approach, Memoised Wake-Sleep (MWS), which alleviates part of the problem by memoising discrete variables, and extend it to allow for a principled and effective way to handle continuous variables by learning a separate recognition model used for importance-sampling based approximate inference and marginalization. We evaluate HMWS in the GP-kernel learning and 3D scene understanding domains, and show that it outperforms current state-of-the-art inference methods.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

Noisy Natural Gradient as Variational Inference

71 - Guodong Zhang , Shengyang Sun , David Duvenaud 2017

Variational Bayesian neural nets combine the flexibility of deep learning with Bayesian uncertainty estimation. Unfortunately, there is a tradeoff between cheap but simple variational families (e.g.~fully factorized) or expensive and complicated infe rence procedures. We show that natural gradient ascent with adaptive weight noise implicitly fits a variational posterior to maximize the evidence lower bound (ELBO). This insight allows us to train full-covariance, fully factorized, or matrix-variate Gaussian variational posteriors using noi

التعلم الآلي التعلم الالي