Learning Hierarchical Information Flow with Recurrent Neural Modules

71 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Danijar Hafner

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Danijar Hafner - Alex Irpan - James Davidson

التعلم الآلي الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We propose ThalNet, a deep learning model inspired by neocortical communication via the thalamus. Our model consists of recurrent neural modules that send features through a routing center, endowing the modules with the flexibility to share features over multiple time steps. We show that our model learns to route information hierarchically, processing input data by a chain of modules. We observe common architectures, such as feed forward neural networks and skip connections, emerging as special cases of our architecture, while novel connectivity patterns are learned for the text8 compression task. Our model outperforms standard recurrent neural networks on several sequential benchmarks.

قيم البحث

324 - Peng Xiao , Samuel Cheng 2020

In federated learning, models trained on local clients are distilled into a global model. Due to the permutation invariance arises in neural networks, it is necessary to match the hidden neurons first when executing federated learning with neural net works. Through the Bayesian nonparametric framework, Probabilistic Federated Neural Matching (PFNM) matches and fuses local neural networks so as to adapt to varying global model size and the heterogeneity of the data. In this paper, we propose a new method which extends the PFNM with a Kullback-Leibler (KL) divergence over neural components product, in order to make inference exploiting posterior information in both local and global levels. We also show theoretically that The additional part can be seamlessly concatenated into the match-and-fuse progress. Through a series of simulations, it indicates that our new method outperforms popular state-of-the-art federated learning methods in both single communication round and additional communication rounds situation.

التعلم الآلي الذكاء الاصطناعي

Learning Compact Recurrent Neural Networks

118 - Zhiyun Lu , Vikas Sindhwani , Tara N. Sainath 2016

Recurrent neural networks (RNNs), including long short-term memory (LSTM) RNNs, have produced state-of-the-art results on a variety of speech recognition tasks. However, these models are often too large in size for deployment on mobile devices with m emory and latency constraints. In this work, we study mechanisms for learning compact RNNs and LSTMs via low-rank factorizations and parameter sharing schemes. Our goal is to investigate redundancies in recurrent architectures where compression can be admitted without losing performance. A hybrid strategy of using structured matrices in the bottom layers and shared low-rank factors on the top layers is found to be particularly effective, reducing the parameters of a standard LSTM by 75%, at a small cost of 0.3% increase in WER, on a 2,000-hr English Voice Search task.

التعلم الآلي الحساب واللغة الحوسبة العصبية والتطورية

Block-Sparse Recurrent Neural Networks

112 - Sharan Narang , Eric Undersander , Gregory Diamos 2017

Recurrent Neural Networks (RNNs) are used in state-of-the-art models in domains such as speech recognition, machine translation, and language modelling. Sparsity is a technique to reduce compute and memory requirements of deep learning models. Sparse RNNs are easier to deploy on devices and high-end server processors. Even though sparse operations need less compute and memory relative to their dense counterparts, the speed-up observed by using sparse operations is less than expected on different hardware platforms. In order to address this issue, we investigate two different approaches to induce block sparsity in RNNs: pruning blocks of weights in a layer and using group lasso regularization to create blocks of weights with zeros. Using these techniques, we demonstrate that we can create block-sparse RNNs with sparsity ranging from 80% to 90% with small loss in accuracy. This allows us to reduce the model size by roughly 10x. Additionally, we can prune a larger dense network to recover this loss in accuracy while maintaining high block sparsity and reducing the overall parameter count. Our technique works with a variety of block sizes up to 32x32. Block-sparse RNNs eliminate overheads related to data storage and irregular memory accesses while increasing hardware efficiency compared to unstructured sparsity.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Hierarchical Neural Dynamic Policies

87 - Shikhar Bahl , Abhinav Gupta , Deepak Pathak 2021

We tackle the problem of generalization to unseen configurations for dynamic tasks in the real world while learning from high-dimensional image input. The family of nonlinear dynamical system-based methods have successfully demonstrated dynamic robot behaviors but have difficulty in generalizing to unseen configurations as well as learning from image inputs. Recent works approach this issue by using deep network policies and reparameterize actions to embed the structure of dynamical systems but still struggle in domains with diverse configurations of image goals, and hence, find it difficult to generalize. In this paper, we address this dichotomy by leveraging embedding the structure of dynamical systems in a hierarchical deep policy learning framework, called Hierarchical Neural Dynamical Policies (H-NDPs). Instead of fitting deep dynamical systems to diverse data directly, H-NDPs form a curriculum by learning local dynamical system-based policies on small regions in state-space and then distill them into a global dynamical system-based policy that operates only from high-dimensional images. H-NDPs additionally provide smooth trajectories, a strong safety benefit in the real world. We perform extensive experiments on dynamic tasks both in the real world (digit writing, scooping, and pouring) and simulation (catching, throwing, picking). We show that H-NDPs are easily integrated with both imitation as well as reinforcement learning setups and achieve state-of-the-art results. Video results are at https://shikharbahl.github.io/hierarchical-ndps/

التعلم الآلي الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

Improving Botnet Detection with Recurrent Neural Network and Transfer Learning

492 - Jeeyung Kim , Alex Sim , Jinoh Kim 2021

Botnet detection is a critical step in stopping the spread of botnets and preventing malicious activities. However, reliable detection is still a challenging task, due to a wide variety of botnets involving ever-increasing types of devices and attack vectors. Recent approaches employing machine learning (ML) showed improved performance than earlier ones, but these ML- based approaches still have significant limitations. For example, most ML approaches can not incorporate sequential pattern analysis techniques key to detect some classes of botnets. Another common shortcoming of ML-based approaches is the need to retrain neural networks in order to detect the evolving botnets; however, the training process is time-consuming and requires significant efforts to label the training data. For fast-evolving botnets, it might take too long to create sufficient training samples before the botnets have changed again. To address these challenges, we propose a novel botnet detection method, built upon Recurrent Variational Autoencoder (RVAE) that effectively captures sequential characteristics of botnet activities. In the experiment, this semi-supervised learning method achieves better detection accuracy than similar learning methods, especially on hard to detect classes. Additionally, we devise a transfer learning framework to learn from a well-curated source data set and transfer the knowledge to a target problem domain not seen before. Tests show that the true-positive rate (TPR) with transfer learning is higher than the RVAE semi-supervised learning method trained using the target data set (91.8% vs. 68.3%).

التعلم الآلي التشفير والأمن بنية الشبكات والإنترنت