Latent Alignment and Variational Attention

54 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yuntian Deng

تاريخ النشر 2018

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Yuntian Deng - Yoon Kim - Justin Chiu

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Neural attention has become central to many state-of-the-art models in natural language processing and related domains. Attention networks are an easy-to-train and effective method for softly simulating alignment; however, the approach does not marginalize over latent alignments in a probabilistic sense. This property makes it difficult to compare attention to other alignment approaches, to compose it with probabilistic models, and to perform posterior inference conditioned on observed data. A related latent approach, hard attention, fixes these issues, but is generally harder to train and less accurate. This work considers variational attention networks, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference. We further propose methods for reducing the variance of gradients to make these approaches computationally feasible. Experiments show that for machine translation and visual question answering, inefficient exact latent variable models outperform standard neural attention, but these gains go away when using hard attention based training. On the other hand, variational attention retains most of the performance gain but with training speed comparable to neural attention.

قيم البحث

56 - Zachary M. Ziegler , Alexander M. Rush 2019

Normalizing flows are a powerful class of generative models for continuous random variables, showing both strong model flexibility and the potential for non-autoregressive generation. These benefits are also desired when modeling discrete random vari ables such as text, but directly applying normalizing flows to discrete sequences poses significant additional challenges. We propose a VAE-based generative model which jointly learns a normalizing flow-based distribution in the latent space and a stochastic mapping to an observed discrete space. In this setting, we find that it is crucial for the flow-based distribution to be highly multimodal. To capture this property, we propose several normalizing flow architectures to maximize model flexibility. Experiments consider common discrete sequence tasks of character-level language modeling and polyphonic music generation. Our results indicate that an autoregressive flow-based model can match the performance of a comparable autoregressive baseline, and a non-autoregressive flow-based model can improve generation speed with a penalty to performance.

التعلم الالي الحساب واللغة التعلم الآلي

Audio Visual Emotion Recognition with Temporal Alignment and Perception Attention

244 - Linlin Chao , Jianhua Tao , Minghao Yang 2016

This paper focuses on two key problems for audio-visual emotion recognition in the video. One is the audio and visual streams temporal alignment for feature level fusion. The other one is locating and re-weighting the perception attentions in the who le audio-visual stream for better recognition. The Long Short Term Memory Recurrent Neural Network (LSTM-RNN) is employed as the main classification architecture. Firstly, soft attention mechanism aligns the audio and visual streams. Secondly, seven emotion embedding vectors, which are corresponding to each classification emotion type, are added to locate the perception attentions. The locating and re-weighting process is also based on the soft attention mechanism. The experiment results on EmotiW2015 dataset and the qualitative analysis show the efficiency of the proposed two techniques.

الرؤية الحاسوبية وتمييز الأنماط الحساب واللغة التعلم الآلي

Gaussian Process Latent Variable Alignment Learning

133 - Ieva Kazlauskaite , Carl Henrik Ek , Neill D. F. Campbell 2018

We present a model that can automatically learn alignments between high-dimensional data in an unsupervised manner. Our proposed method casts alignment learning in a framework where both alignment and data are modelled simultaneously. Further, we aut omatically infer groupings of different types of sequences within the same dataset. We derive a probabilistic model built on non-parametric priors that allows for flexible warps while at the same time providing means to specify interpretable constraints. We demonstrate the efficacy of our approach with superior quantitative performance to the state-of-the-art approaches and provide examples to illustrate the versatility of our model in automatic inference of sequence groupings, absent from previous approaches, as well as easy specification of high level priors for different modalities of data.

التعلم الالي التعلم الآلي

Scalable inference of topic evolution via models for latent geometric structures

78 - Mikhail Yurochkin , Zhiwei Fan , Aritra Guha 2018

We develop new models and algorithms for learning the temporal dynamics of the topic polytopes and related geometric objects that arise in topic model based inference. Our model is nonparametric Bayesian and the corresponding inference algorithm is a ble to discover new topics as the time progresses. By exploiting the connection between the modeling of topic polytope evolution, Beta-Bernoulli process and the Hungarian matching algorithm, our method is shown to be several orders of magnitude faster than existing topic modeling approaches, as demonstrated by experiments working with several million documents in under two dozens of minutes.

التعلم الالي الحساب واللغة التعلم الآلي

Improve Diverse Text Generation by Self Labeling Conditional Variational Auto Encoder

301 - Yuchi Zhang , Yongliang Wang , Liping Zhang 2019

Diversity plays a vital role in many text generating applications. In recent years, Conditional Variational Auto Encoders (CVAE) have shown promising performances for this task. However, they often encounter the so called KL-Vanishing problem. Previo us works mitigated such problem by heuristic methods such as strengthening the encoder or weakening the decoder while optimizing the CVAE objective function. Nevertheless, the optimizing direction of these methods are implicit and it is hard to find an appropriate degree to which these methods should be applied. In this paper, we propose an explicit optimizing objective to complement the CVAE to directly pull away from KL-vanishing. In fact, this objective term guides the encoder towards the best encoder of the decoder to enhance the expressiveness. A labeling network is introduced to estimate the best encoder. It provides a continuous label in the latent space of CVAE to help build a close connection between latent variables and targets. The whole proposed method is named Self Labeling CVAE~(SLCVAE). To accelerate the research of diverse text generation, we also propose a large native one-to-many dataset. Extensive experiments are conducted on two tasks, which show that our method largely improves the generating diversity while achieving comparable accuracy compared with state-of-art algorithms.

التعلم الالي الحساب واللغة التعلم الآلي