Improved Variational Autoencoders for Text Modeling using Dilated Convolutions

162 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zichao Yang

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zichao Yang - Zhiting Hu - Ruslan Salakhutdinov

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Recent work on generative modeling of text has found that variational auto-encoders (VAE) incorporating LSTM decoders perform worse than simpler LSTM language models (Bowman et al., 2015). This negative result is so far poorly understood, but has been attributed to the propensity of LSTM decoders to ignore conditioning information from the encoder. In this paper, we experiment with a new type of decoder for VAE: a dilated CNN. By changing the decoders dilation architecture, we control the effective context from previously generated words. In experiments, we find that there is a trade off between the contextual capacity of the decoder and the amount of encoding information used. We show that with the right decoder, VAE can outperform LSTM language models. We demonstrate perplexity gains on two datasets, representing the first positive experimental result on the use VAE for generative modeling of text. Further, we conduct an in-depth investigation of the use of VAE (with our new decoding architecture) for semi-supervised and unsupervised labeling tasks, demonstrating gains over several strong baselines.

قيم البحث

304 - Alice Coucke , Mohammed Chlieh , Thibault Gisselbrecht 2018

We explore the application of end-to-end stateless temporal modeling to small-footprint keyword spotting as opposed to recurrent networks that model long-term temporal dependencies using internal states. We propose a model inspired by the recent succ ess of dilated convolutions in sequence modeling applications, allowing to train deeper architectures in resource-constrained configurations. Gated activations and residual connections are also added, following a similar configuration to WaveNet. In addition, we apply a custom target labeling that back-propagates loss from specific frames of interest, therefore yielding higher accuracy and only requiring to detect the end of the keyword. Our experimental results show that our model outperforms a max-pooling loss trained recurrent neural network using LSTM cells, with a significant decrease in false rejection rate. The underlying dataset - Hey Snips utterances recorded by over 2.2K different speakers - has been made publicly available to establish an open reference for wake-word detection.

التعلم الآلي الحساب واللغة أنظمة الصوت في الحاسوب

Text Modeling with Syntax-Aware Variational Autoencoders

85 - Yijun Xiao , William Yang Wang 2019

Syntactic information contains structures and rules about how text sentences are arranged. Incorporating syntax into text modeling methods can potentially benefit both representation learning and generation. Variational autoencoders (VAEs) are deep g enerative models that provide a probabilistic way to describe observations in the latent space. When applied to text data, the latent representations are often unstructured. We propose syntax-aware variational autoencoders (SAVAEs) that dedicate a subspace in the latent dimensions dubbed syntactic latent to represent syntactic structures of sentences. SAVAEs are trained to infer syntactic latent from either text inputs or parsed syntax results as well as reconstruct original text with inferred latent variables. Experiments show that SAVAEs are able to achieve lower reconstruction loss on four different data sets. Furthermore, they are capable of generating examples with modified target syntax.

الحساب واللغة

Dirichlet Variational Autoencoder for Text Modeling

199 - Yijun Xiao , Tiancheng Zhao , William Yang Wang 2018

We introduce an improved variational autoencoder (VAE) for text modeling with topic information explicitly modeled as a Dirichlet latent variable. By providing the proposed model topic awareness, it is more superior at reconstructing input texts. Fur thermore, due to the inherent interactions between the newly introduced Dirichlet variable and the conventional multivariate Gaussian variable, the model is less prone to KL divergence vanishing. We derive the variational lower bound for the new model and conduct experiments on four different data sets. The results show that the proposed model is superior at text reconstruction across the latent space and classifications on learned representations have higher test accuracies.

الحساب واللغة الذكاء الاصطناعي التعلم الآلي

Discrete Auto-regressive Variational Attention Models for Text Modeling

187 - Xianghong Fang , Haoli Bai , Jian Li 2021

Variational autoencoders (VAEs) have been widely applied for text modeling. In practice, however, they are troubled by two challenges: information underrepresentation and posterior collapse. The former arises as only the last hidden state of LSTM enc oder is transformed into the latent space, which is generally insufficient to summarize the data. The latter is a long-standing problem during the training of VAEs as the optimization is trapped to a disastrous local optimum. In this paper, we propose Discrete Auto-regressive Variational Attention Model (DAVAM) to address the challenges. Specifically, we introduce an auto-regressive variational attention approach to enrich the latent space by effectively capturing the semantic dependency from the input. We further design discrete latent space for the variational attention and mathematically show that our model is free from posterior collapse. Extensive experiments on language modeling tasks demonstrate the superiority of DAVAM against several VAE counterparts.

التعلم الآلي الحساب واللغة

Variational Autoencoders for Collaborative Filtering

344 - Dawen Liang , Rahul G. Krishnan , Matthew D. Hoffman 2018

We extend variational autoencoders (VAEs) to collaborative filtering for implicit feedback. This non-linear probabilistic model enables us to go beyond the limited modeling capacity of linear factor models which still largely dominate collaborative f iltering research.We introduce a generative model with multinomial likelihood and use Bayesian inference for parameter estimation. Despite widespread use in language modeling and economics, the multinomial likelihood receives less attention in the recommender systems literature. We introduce a different regularization parameter for the learning objective, which proves to be crucial for achieving competitive performance. Remarkably, there is an efficient way to tune the parameter using annealing. The resulting model and learning algorithm has information-theoretic connections to maximum entropy discrimination and the information bottleneck principle. Empirically, we show that the proposed approach significantly outperforms several state-of-the-art baselines, including two recently-proposed neural network approaches, on several real-world datasets. We also provide extended experiments comparing the multinomial likelihood with other commonly used likelihood functions in the latent factor collaborative filtering literature and show favorable results. Finally, we identify the pros and cons of employing a principled Bayesian inference approach and characterize settings where it provides the most significant improvements.

التعلم الالي استرجاع المعلومات التعلم الآلي