ﻻ يوجد ملخص باللغة العربية
Position representation is crucial for building position-aware representations in Transformers. Existing position representations suffer from a lack of generalization to test data with unseen lengths or high computational cost. We investigate shifted absolute position embedding (SHAPE) to address both issues. The basic idea of SHAPE is to achieve shift invariance, which is a key property of recent successful position representations, by randomly shifting absolute positions during training. We demonstrate that SHAPE is empirically comparable to its counterpart while being simpler and faster.
Transformers are arguably the main workhorse in recent Natural Language Processing research. By definition a Transformer is invariant with respect to reordering of the input. However, language is inherently sequential and word order is essential to t
Neural machine translation has achieved remarkable empirical performance over standard benchmark datasets, yet recent evidence suggests that the models can still fail easily dealing with substandard inputs such as misspelled words, To overcome this i
A major challenge in deploying transformer models is their prohibitive inference cost, which quadratically scales with the input sequence length. This makes it especially difficult to use transformers for processing long sequences. To address this, w
Low-resource language translation is a challenging but socially valuable NLP task. Building on recent work adapting the Transformers normalization to this setting, we propose QKNorm, a normalization technique that modifies the attention mechanism to
Despite the great promise of Transformers in many sequence modeling tasks (e.g., machine translation), their deterministic nature hinders them from generalizing to high entropy tasks such as dialogue response generation. Previous work proposes to cap