ﻻ يوجد ملخص باللغة العربية
Adversarial regularization can improve model generalization in many natural language processing tasks. However, conventional approaches are computationally expensive since they need to generate a perturbation for each sample in each epoch. We propose a new adversarial regularization method ARCH (adversarial regularization with caching), where perturbations are generated and cached once every several epochs. As caching all the perturbations imposes memory usage concerns, we adopt a K-nearest neighbors-based strategy to tackle this issue. The strategy only requires caching a small amount of perturbations, without introducing additional training time. We evaluate our proposed method on a set of neural machine translation and natural language understanding tasks. We observe that ARCH significantly eases the computational burden (saves up to 70% of computational time in comparison with conventional approaches). More surprisingly, by reducing the variance of stochastic gradients, ARCH produces a notably better (in most of the tasks) or comparable model generalization. Our code is publicly available.
Data augmentation has proved extremely useful by increasing training data variance to alleviate overfitting and improve deep neural networks generalization performance. In medical image analysis, a well-designed augmentation policy usually requires m
We propose an effective consistency training framework that enforces a training models predictions given original and perturbed inputs to be similar by adding a discrete noise that would incur the highest divergence between predictions. This virtual
Unsupervised neural machine translation (UNMT) has recently attracted great interest in the machine translation community. The main advantage of the UNMT lies in its easy collection of required large training text sentences while with only a slightly
Automatically generating coherent and semantically meaningful text has many applications in machine translation, dialogue systems, image captioning, etc. Recently, by combining with policy gradient, Generative Adversarial Nets (GAN) that use a discri
This paper explores the use of adversarial examples in training speech recognition systems to increase robustness of deep neural network acoustic models. During training, the fast gradient sign method is used to generate adversarial examples augmenti