Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Can the Transformer Be Used as a Drop-in Replacement for RNNs in Text-Generating GANs?

55 0 0.0 ( 0 )

Download Cite

Added by Andrei Kucharavy

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Kevin Blin - Andrei Kucharavy

Machine Learning Computation and Language

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper we address the problem of fine-tuned text generation with a limited computational budget. For that, we use a well-performing text generative adversarial network (GAN) architecture - Diversity-Promoting GAN (DPGAN), and attempted a drop-in replacement of the LSTM layer with a self-attention-based Transformer layer in order to leverage their efficiency. The resulting Self-Attention DPGAN (SADPGAN) was evaluated for performance, quality and diversity of generated text and stability. Computational experiments suggested that a transformer architecture is unable to drop-in replace the LSTM layer, under-performing during the pre-training phase and undergoing a complete mode collapse during the GAN tuning phase. Our results suggest that the transformer architecture need to be adapted before it can be used as a replacement for RNNs in text-generating GANs.

rate research

The geometry of integration in text classification RNNs

195 - Kyle Aitken , Vinay V. Ramasesh , Ankush Garg 2020

Despite the widespread application of recurrent neural networks (RNNs) across a variety of tasks, a unified understanding of how RNNs solve these tasks remains elusive. In particular, it is unclear what dynamical patterns arise in trained RNNs, and how those patterns depend on the training dataset or task. This work addresses these questions in the context of a specific natural language processing task: text classification. Using tools from dynamical systems analysis, we study recurrent networks trained on a battery of both natural and synthetic text classification tasks. We find the dynamics of these trained RNNs to be both interpretable and low-dimensional. Specifically, across architectures and datasets, RNNs accumulate evidence for each class as they process the text, using a low-dimensional attractor manifold as the underlying mechanism. Moreover, the dimensionality and geometry of the attractor manifold are determined by the structure of the training dataset; in particular, we describe how simple word-count statistics computed on the training dataset can be used to predict these properties. Our observations span multiple architectures and datasets, reflecting a common mechanism RNNs employ to perform text classification. To the degree that integration of evidence towards a decision is a common computational primitive, this work lays the foundation for using dynamical systems techniques to study the inner workings of RNNs.

Machine Learning Computation and Language Machine Learning

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

99 - Colin Raffel , Noam Shazeer , Adam Roberts 2019

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Machine Learning Computation and Language Machine Learning

Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training

323 - Anup Sarma , Sonali Singh , Huaipan Jiang 2021

Recurrent Neural Networks (RNNs), more specifically their Long Short-Term Memory (LSTM) variants, have been widely used as a deep learning tool for tackling sequence-based learning tasks in text and speech. Training of such LSTM applications is computationally intensive due to the recurrent nature of hidden state computation that repeats for each time step. While sparsity in Deep Neural Nets has been widely seen as an opportunity for reducing computation time in both training and inference phases, the usage of non-ReLU activation in LSTM RNNs renders the opportunities for such dynamic sparsity associated with neuron activation and gradient values to be limited or non-existent. In this work, we identify dropout induced sparsity for LSTMs as a suitable mode of computation reduction. Dropout is a widely used regularization mechanism, which randomly drops computed neuron values during each iteration of training. We propose to structure dropout patterns, by dropping out the same set of physical neurons within a batch, resulting in column (row) level hidden state sparsity, which are well amenable to computation reduction at run-time in general-purpose SIMD hardware as well as systolic arrays. We conduct our experiments for three representative NLP tasks: language modelling on the PTB dataset, OpenNMT based machine translation using the IWSLT De-En and En-Vi datasets, and named entity recognition sequence labelling using the CoNLL-2003 shared task. We demonstrate that our proposed approach can be used to translate dropout-based computation reduction into reduced training time, with improvement ranging from 1.23x to 1.64x, without sacrificing the target metric.

Machine Learning Computation and Language Performance

Can high risk fungicides be used in mixtures without selecting for fungicide resistance?

330 - Alexey Mikaberidze , Bruce A. McDonald , Sebastian Bonhoeffer 2013

Fungicide mixtures produced by the agrochemical industry often contain low-risk fungicides, to which fungal pathogens are fully sensitive, together with high-risk fungicides known to be prone to fungicide resistance. Can these mixtures provide adequate disease control while minimizing the risk for the development of resistance? We present a population dynamics model to address this question. We found that the fitness cost of resistance is a crucial parameter to determine the outcome of competition between the sensitive and resistant pathogen strains and to assess the usefulness of a mixture. If fitness costs are absent, then the use of the high-risk fungicide in a mixture selects for resistance and the fungicide eventually becomes nonfunctional. If there is a cost of resistance, then an optimal ratio of fungicides in the mixture can be found, at which selection for resistance is expected to vanish and the level of disease control can be optimized.

Populations and Evolution

Generating User-friendly Explanations for Loan Denials using GANs

365 - Ramya Srinivasan , Ajay Chander , Pouya Pezeshkpour 2019

Financial decisions impact our lives, and thus everyone from the regulator to the consumer is interested in fair, sound, and explainable decisions. There is increasing competitive desire and regulatory incentive to deploy AI mindfully within financial services. An important mechanism towards that end is to explain AI decisions to various stakeholders. State-of-the-art explainable AI systems mostly serve AI engineers and offer little to no value to business decision makers, customers, and other stakeholders. Towards addressing this gap, in this work we consider the scenario of explaining loan denials. We build the first-of-its-kind dataset that is representative of loan-applicant friendly explanations. We design a novel Generative Adversarial Network (GAN) that can accommodate smaller datasets, to generate user-friendly textual explanations. We demonstrate how our system can also generate explanations serving different purposes: those that help educate the loan applicants, or help them take appropriate action towards a future approval.

Machine Learning Human-Computer Interaction Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Can the Transformer Be Used as a Drop-in Replacement for RNNs in Text-Generating GANs?

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions