Inducing Meaningful Units from Character Sequences with Slot Attention

88 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Melika Behjati

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Melika Behjati - James Henderson

الحساب واللغة التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Characters do not convey meaning, but sequences of characters do. We propose an unsupervised distributional method to learn the abstract meaning-bearing units in a sequence of characters. Rather than segmenting the sequence, this model discovers continuous representations of the objects in the sequence, using a recently proposed architecture for object discovery in images called Slot Attention. We train our model on different languages and evaluate the quality of the obtained representations with probing classifiers. Our experiments show promising results in the ability of our units to capture meaning at a higher level of abstraction.

قيم البحث

اقرأ أيضاً

Sparse Attention with Linear Units

108 - Biao Zhang , Ivan Titov , Rico Sennrich 2021

Recently, it has been argued that encoder-decoder models can be made more interpretable by replacing the softmax function in the attention with its sparse variants. In this work, we introduce a novel, simple method for achieving sparsity in attention : we replace the softmax activation with a ReLU, and show that sparsity naturally emerges from such a formulation. Training stability is achieved with layer normalization with either a specialized initialization or an additional gating function. Our model, which we call Rectified Linear Attention (ReLA), is easy to implement and more efficient than previously proposed sparse attention mechanisms. We apply ReLA to the Transformer and conduct experiments on five machine translation tasks. ReLA achieves translation performance comparable to several strong baselines, with training and decoding speed similar to that of the vanilla attention. Our analysis shows that ReLA delivers high sparsity rate and head diversity, and the induced cross attention achieves better accuracy with respect to source-target word alignment than recent sparsified softmax-based models. Intriguingly, ReLA heads also learn to attend to nothing (i.e. switch off) for some queries, which is not possible with sparsified softmax alternatives.

الحساب واللغة التعلم الآلي

Character-Level Translation with Self-attention

62 - Yingqiang Gao , Nikola I. Nikolov , Yuhuang Hu 2020

We explore the suitability of self-attention models for character-level neural machine translation. We test the standard transformer model, as well as a novel variant in which the encoder block combines information from nearby characters using convol utions. We perform extensive experiments on WMT and UN datasets, testing both bilingual and multilingual translation to English using up to three input languages (French, Spanish, and Chinese). Our transformer variant consistently outperforms the standard transformer at the character-level and converges faster while learning more robust character-level alignments.

الحساب واللغة

Character-Level Language Modeling with Deeper Self-Attention

104 - Rami Al-Rfou , Dokook Choe , Noah Constant 2018

LSTMs and other RNN variants have shown strong performance on character-level language modeling. These models are typically trained using truncated backpropagation through time, and it is common to assume that their success stems from their ability t o remember long-term contexts. In this paper, we show that a deep (64-layer) transformer model with fixed context outperforms RNN variants by a large margin, achieving state of the art on two popular benchmarks: 1.13 bits per character on text8 and 1.06 on enwik8. To get good results at this depth, we show that it is important to add auxiliary losses, both at intermediate network layers and intermediate sequence positions.

الحساب واللغة الذكاء الاصطناعي التعلم الآلي

Joint Intent Detection and Slot Filling with Wheel-Graph Attention Networks

153 - Pengfei Wei , Bi Zeng , Wenxiong Liao 2021

Intent detection and slot filling are two fundamental tasks for building a spoken language understanding (SLU) system. Multiple deep learning-based joint models have demonstrated excellent results on the two tasks. In this paper, we propose a new joi nt model with a wheel-graph attention network (Wheel-GAT) which is able to model interrelated connections directly for intent detection and slot filling. To construct a graph structure for utterances, we create intent nodes, slot nodes, and directed edges. Intent nodes can provide utterance-level semantic information for slot filling, while slot nodes can also provide local keyword information for intent. Experiments show that our model outperforms multiple baselines on two public datasets. Besides, we also demonstrate that using Bidirectional Encoder Representation from Transformer (BERT) model further boosts the performance in the SLU task.

الحساب واللغة الذكاء الاصطناعي

Stanley sequences with odd character

74 - Richard A. Moy 2017

Given a set of integers containing no 3-term arithmetic progressions, one constructs a Stanley sequence by choosing integers greedily without forming such a progression. Independent Stanley sequences are a well-structured class of Stanley sequences w ith two main parameters: the character $lambda(A)$ and the repeat factor $rho(A)$. Rolnick conjectured that for every $lambda in mathbb{N}_0backslash{1, 3, 5, 9, 11, 15}$, there exists an independent Stanley sequence $S(A)$ such that $lambda(A) =lambda$. This paper demonstrates that $lambda(A) otin {1, 3, 5, 9, 11, 15}$ for any independent Stanley sequence $S(A)$.

التوافقية نظرية الأعداد