Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder

86 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Alvin Chan

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Alvin Chan - Yi Tay - Yew-Soon Ong

الحساب واللغة الذكاء الاصطناعي الحوسبة العصبية والتطورية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper demonstrates a fatal vulnerability in natural language inference (NLI) and text classification systems. More concretely, we present a backdoor poisoning attack on NLP models. Our poisoning attack utilizes conditional adversarially regularized autoencoder (CARA) to generate poisoned training samples by poison injection in latent space. Just by adding 1% poisoned data, our experiments show that a victim BERT finetuned classifiers predictions can be steered to the poison target class with success rates of >80% when the input hypothesis is injected with the poison signature, demonstrating that NLI and text classification systems face a huge security risk.

قيم البحث

226 - Chuan Guo , Alexandre Sablayrolles , Herve Jegou 2021

We propose the first general-purpose gradient-based attack against transformer models. Instead of searching for a single adversarial example, we search for a distribution of adversarial examples parameterized by a continuous-valued matrix, hence enab ling gradient-based optimization. We empirically demonstrate that our white-box attack attains state-of-the-art attack performance on a variety of natural language tasks. Furthermore, we show that a powerful black-box transfer attack, enabled by sampling from the adversarial distribution, matches or exceeds existing methods, while only requiring hard-label outputs.

الحساب واللغة الذكاء الاصطناعي التشفير والأمن

Emotion-Regularized Conditional Variational Autoencoder for Emotional Response Generation

109 - Yu-Ping Ruan , , Zhen-Hua Ling 2021

This paper presents an emotion-regularized conditional variational autoencoder (Emo-CVAE) model for generating emotional conversation responses. In conventional CVAE-based emotional response generation, emotion labels are simply used as additional co nditions in prior, posterior and decoder networks. Considering that emotion styles are naturally entangled with semantic contents in the language space, the Emo-CVAE model utilizes emotion labels to regularize the CVAE latent space by introducing an extra emotion prediction network. In the training stage, the estimated latent variables are required to predict the emotion labels and token sequences of the input responses simultaneously. Experimental results show that our Emo-CVAE model can learn a more informative and structured latent space than a conventional CVAE model and output responses with better content and emotion performance than baseline CVAE and sequence-to-sequence (Seq2Seq) models.

الحساب واللغة

Autoencoder Regularized Network For Driving Style Representation Learning

62 - Weishan Dong , Ting Yuan , Kai Yang 2017

In this paper, we study learning generalized driving style representations from automobile GPS trip data. We propose a novel Autoencoder Regularized deep neural Network (ARNet) and a trip encoding framework trip2vec to learn drivers driving styles di rectly from GPS records, by combining supervised and unsupervised feature learning in a unified architecture. Experiments on a challenging driver number estimation problem and the driver identification problem show that ARNet can learn a good generalized driving style representation: It significantly outperforms existing methods and alternative architectures by reaching the least estimation error on average (0.68, less than one driver) and the highest identification accuracy (by at least 3% improvement) compared with traditional supervised learning methods.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي الحوسبة العصبية والتطورية

Jacobian Adversarially Regularized Networks for Robustness

95 - Alvin Chan , Yi Tay , Yew Soon Ong 2019

Adversarial examples are crafted with imperceptible perturbations with the intent to fool neural networks. Against such attacks, adversarial training and its variants stand as the strongest defense to date. Previous studies have pointed out that robu st models that have undergone adversarial training tend to produce more salient and interpretable Jacobian matrices than their non-robust counterparts. A natural question is whether a model trained with an objective to produce salient Jacobian can result in better robustness. This paper answers this question with affirmative empirical results. We propose Jacobian Adversarially Regularized Networks (JARN) as a method to optimize the saliency of a classifiers Jacobian by adversarially regularizing the models Jacobian to resemble natural training images. Image classifiers trained with JARN show improved robust accuracy compared to standard models on the MNIST, SVHN and CIFAR-10 datasets, uncovering a new angle to boost robustness without using adversarial training examples.

الرؤية الحاسوبية وتمييز الأنماط التشفير والأمن الحوسبة العصبية والتطورية

Dirichlet Variational Autoencoder for Text Modeling

199 - Yijun Xiao , Tiancheng Zhao , William Yang Wang 2018

We introduce an improved variational autoencoder (VAE) for text modeling with topic information explicitly modeled as a Dirichlet latent variable. By providing the proposed model topic awareness, it is more superior at reconstructing input texts. Fur thermore, due to the inherent interactions between the newly introduced Dirichlet variable and the conventional multivariate Gaussian variable, the model is less prone to KL divergence vanishing. We derive the variational lower bound for the new model and conduct experiments on four different data sets. The results show that the proposed model is superior at text reconstruction across the latent space and classifications on learned representations have higher test accuracies.

الحساب واللغة الذكاء الاصطناعي التعلم الآلي