Do you want to publish a course? Click here

Character-Level Translation with Self-attention

63   0   0.0 ( 0 )
 Added by Nikola Nikolov
 Publication date 2020
and research's language is English




Ask ChatGPT about the research

We explore the suitability of self-attention models for character-level neural machine translation. We test the standard transformer model, as well as a novel variant in which the encoder block combines information from nearby characters using convolutions. We perform extensive experiments on WMT and UN datasets, testing both bilingual and multilingual translation to English using up to three input languages (French, Spanish, and Chinese). Our transformer variant consistently outperforms the standard transformer at the character-level and converges faster while learning more robust character-level alignments.



rate research

Read More

LSTMs and other RNN variants have shown strong performance on character-level language modeling. These models are typically trained using truncated backpropagation through time, and it is common to assume that their success stems from their ability to remember long-term contexts. In this paper, we show that a deep (64-layer) transformer model with fixed context outperforms RNN variants by a large margin, achieving state of the art on two popular benchmarks: 1.13 bits per character on text8 and 1.06 on enwik8. To get good results at this depth, we show that it is important to add auxiliary losses, both at intermediate network layers and intermediate sequence positions.
Neural machine translation (NMT) is nowadays commonly applied at the subword level, using byte-pair encoding. A promising alternative approach focuses on character-level translation, which simplifies processing pipelines in NMT considerably. This approach, however, must consider relatively longer sequences, rendering the training process prohibitively expensive. In this paper, we discuss a novel, Transformer-based approach, that we compare, both in speed and in quality to the Transformer at subword and character levels, as well as previously developed character-level models. We evaluate our models on 4 language pairs from WMT15: DE-EN, CS-EN, FI-EN and RU-EN. The proposed novel architecture can be trained on a single GPU and is 34% percent faster than the character-level Transformer; still, the obtained results are at least on par with it. In addition, our proposed model outperforms the subword-level model in FI-EN and shows close results in CS-EN. To stimulate further research in this area and close the gap with subword-level NMT, we make all our code and models publicly available.
Chest radiography is a general method for diagnosing a patients condition and identifying important information; therefore, radiography is used extensively in routine medical practice in various situations, such as emergency medical care and medical checkup. However, a high level of expertise is required to interpret chest radiographs. Thus, medical specialists spend considerable time in diagnosing such huge numbers of radiographs. In order to solve these problems, methods for generating findings have been proposed. However, the study of generating chest radiograph findings has primarily focused on the English language, and to the best of our knowledge, no studies have studied Japanese data on this subject. There are two challenges involved in generating findings in the Japanese language. The first challenge is that word splitting is difficult because the boundaries of Japanese word are not clear. The second challenge is that there are numerous orthographic variants. For deal with these two challenges, we proposed an end-to-end model that generates Japanese findings at the character-level from chest radiographs. In addition, we introduced the attention mechanism to improve not only the accuracy, but also the interpretation ability of the results. We evaluated the proposed method using a public dataset with Japanese findings. The effectiveness of the proposed method was confirmed using the Bilingual Evaluation Understudy score. And, we were confirmed from the generated findings that the proposed method was able to consider the orthographic variants. Furthermore, we confirmed via visual inspection that the attention mechanism captures the features and positional information of radiographs.
167 - Kaitao Song , Xu Tan , Furong Peng 2018
The encoder-decoder is the typical framework for Neural Machine Translation (NMT), and different structures have been developed for improving the translation performance. Transformer is one of the most promising structures, which can leverage the self-attention mechanism to capture the semantic dependency from global view. However, it cannot distinguish the relative position of different tokens very well, such as the tokens located at the left or right of the current token, and cannot focus on the local information around the current token either. To alleviate these problems, we propose a novel attention mechanism named Hybrid Self-Attention Network (HySAN) which accommodates some specific-designed masks for self-attention network to extract various semantic, such as the global/local information, the left/right part context. Finally, a squeeze gate is introduced to combine different kinds of SANs for fusion. Experimental results on three machine translation tasks show that our proposed framework outperforms the Transformer baseline significantly and achieves superior results over state-of-the-art NMT systems.
55 - Frederick Liu , Han Lu , Chieh Lo 2017
Previous work has modeled the compositionality of words by creating character-level models of meaning, reducing problems of sparsity for rare words. However, in many writing systems compositionality has an effect even on the character-level: the meaning of a character is derived by the sum of its parts. In this paper, we model this effect by creating embeddings for characters based on their visual characteristics, creating an image for the character and running it through a convolutional neural network to produce a visual character embedding. Experiments on a text classification task demonstrate that such model allows for better processing of instances with rare characters in languages such as Chinese, Japanese, and Korean. Additionally, qualitative analyses demonstrate that our proposed model learns to focus on the parts of characters that carry semantic content, resulting in embeddings that are coherent in visual space.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا