ترغب بنشر مسار تعليمي؟ اضغط هنا

Self-Attentional Models for Lattice Inputs

357   0   0.0 ( 0 )
 نشر من قبل Matthias Sperber
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Lattices are an efficient and effective method to encode ambiguity of upstream systems in natural language processing tasks, for example to compactly capture multiple speech recognition hypotheses, or to represent multiple linguistic analyses. Previous work has extended recurrent neural networks to model lattice inputs and achieved improvements in various tasks, but these models suffer from very slow computation speeds. This paper extends the recently proposed paradigm of self-attention to handle lattice inputs. Self-attention is a sequence modeling technique that relates inputs to one another by computing pairwise similarities and has gained popularity for both its strong results and its computational efficiency. To extend such models to handle lattices, we introduce probabilistic reachability masks that incorporate lattice structure into the model and support lattice scores if available. We also propose a method for adapting positional embeddings to lattice structures. We apply the proposed model to a speech translation task and find that it outperforms all examined baselines while being much faster to compute than previous neural lattice models during both training and inference.



قيم البحث

اقرأ أيضاً

Dual encoders perform retrieval by encoding documents and queries into dense lowdimensional vectors, scoring each document by its inner product with the query. We investigate the capacity of this architecture relative to sparse bag-of-words models an d attentional neural networks. Using both theoretical and empirical analysis, we establish connections between the encoding dimension, the margin between gold and lower-ranked documents, and the document length, suggesting limitations in the capacity of fixed-length encodings to support precise retrieval of long documents. Building on these insights, we propose a simple neural model that combines the efficiency of dual encoders with some of the expressiveness of more costly attentional architectures, and explore sparse-dense hybrids to capitalize on the precision of sparse retrieval. These models outperform strong alternatives in large-scale retrieval.
Few-shot Knowledge Graph (KG) completion is a focus of current research, where each task aims at querying unseen facts of a relation given its few-shot reference entity pairs. Recent attempts solve this problem by learning static representations of e ntities and references, ignoring their dynamic properties, i.e., entities may exhibit diverse roles within task relations, and references may make different contributions to queries. This work proposes an adaptive attentional network for few-shot KG completion by learning adaptive entity and reference representations. Specifically, entities are modeled by an adaptive neighbor encoder to discern their task-oriented roles, while references are modeled by an adaptive query-aware aggregator to differentiate their contributions. Through the attention mechanism, both entities and references can capture their fine-grained semantic meanings, and thus render more expressive representations. This will be more predictive for knowledge acquisition in the few-shot scenario. Evaluation in link prediction on two public datasets shows that our approach achieves new state-of-the-art results with different few-shot sizes.
The ability to transfer knowledge to novel environments and tasks is a sensible desiderata for general learning agents. Despite the apparent promises, transfer in RL is still an open and little exploited research area. In this paper, we take a brand- new perspective about transfer: we suggest that the ability to assign credit unveils structural invariants in the tasks that can be transferred to make RL more sample-efficient. Our main contribution is SECRET, a novel approach to transfer learning for RL that uses a backward-view credit assignment mechanism based on a self-attentive architecture. Two aspects are key to its generality: it learns to assign credit as a separate offline supervised process and exclusively modifies the reward function. Consequently, it can be supplemented by transfer methods that do not modify the reward function and it can be plugged on top of any RL algorithm.
In this work, we propose a new language modeling paradigm that has the ability to perform both prediction and moderation of information flow at multiple granularities: neural lattice language models. These models construct a lattice of possible paths through a sentence and marginalize across this lattice to calculate sequence probabilities or optimize parameters. This approach allows us to seamlessly incorporate linguistic intuitions - including polysemy and existence of multi-word lexical items - into our language model. Experiments on multiple language modeling tasks show that English neural lattice language models that utilize polysemous embeddings are able to improve perplexity by 9.95% relative to a word-level baseline, and that a Chinese model that handles multi-character tokens is able to improve perplexity by 20.94% relative to a character-level baseline.
Second order beta-decay processes with and without neutrinos in the final state are key probes of nuclear physics and of the nature of neutrinos. Neutrinoful double-beta decay is the rarest Standard Model process that has been observed and provides a unique test of the understanding of weak nuclear interactions. Observation of neutrinoless double-beta decay would reveal that neutrinos are Majorana fermions and that lepton number conservation is violated in nature. While significant progress has been made in phenomenological approaches to understanding these processes, establishing a connection between these processes and the physics of the Standard Model and beyond is a critical task as it will provide input into the design and interpretation of future experiments. The strong-interaction contributions to double-beta decay processes are non-perturbative and can only be addressed systematically through a combination of lattice Quantum Chromoodynamics (LQCD) and nuclear many-body calculations. In this review, current efforts to establish the LQCD connection are discussed for both neutrinoful and neutrinoless double-beta decay. LQCD calculations of the hadronic contributions to the neutrinoful process $nnto pp e^- e^- bar u_ebar u_e$ and to various neutrinoless pionic transitions are reviewed, and the connections of these calculations to the phenomenology of double-beta decay through the use of effective field theory (EFTs) is highlighted. At present, LQCD calculations are limited to small nuclear systems, and to pionic subsystems, and require matching to appropriate EFTs to have direct phenomenological impact. However, these calculations have already revealed qualitatively that there are terms in the EFTs that can only be constrained from double-beta decay processes themselves or using inputs from LQCD. Future prospects for direct calculations in larger nuclei are also discussed.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا