Overcomplete Representations Against Adversarial Videos

61 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Shao-Yuan Lo

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Shao-Yuan Lo - Jeya Maria Jose Valanarasu - Vishal M. Patel

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Adversarial robustness of deep neural networks is an extensively studied problem in the literature and various methods have been proposed to defend against adversarial images. However, only a handful of defense methods have been developed for defending against attacked videos. In this paper, we propose a novel Over-and-Under complete restoration network for Defending against adversarial videos (OUDefend). Most restoration networks adopt an encoder-decoder architecture that first shrinks spatial dimension then expands it back. This approach learns undercomplete representations, which have large receptive fields to collect global information but overlooks local details. On the other hand, overcomplete representations have opposite properties. Hence, OUDefend is designed to balance local and global features by learning those two representations. We attach OUDefend to target video recognition models as a feature restoration block and train the entire network end-to-end. Experimental results show that the defenses focusing on images may be ineffective to videos, while OUDefend enhances robustness against different types of adversarial videos, ranging from additive attacks, multiplicative attacks to physically realizable attacks. Code: https://github.com/shaoyuanlo/OUDefend

قيم البحث

159 - Yihao Huang , Qing Guo , Felix Juefei-Xu 2021

High-level representation-guided pixel denoising and adversarial training are independent solutions to enhance the robustness of CNNs against adversarial attacks by pre-processing input data and re-training models, respectively. Most recently, advers arial training techniques have been widely studied and improved while the pixel denoising-based method is getting less attractive. However, it is still questionable whether there exists a more advanced pixel denoising-based method and whether the combination of the two solutions benefits each other. To this end, we first comprehensively investigate two kinds of pixel denoising methods for adversarial robustness enhancement (i.e., existing additive-based and unexplored filtering-based methods) under the loss functions of image-level and semantic-level restorations, respectively, showing that pixel-wise filtering can obtain much higher image quality (e.g., higher PSNR) as well as higher robustness (e.g., higher accuracy on adversarial examples) than existing pixel-wise additive-based method. However, we also observe that the robustness results of the filtering-based method rely on the perturbation amplitude of adversarial examples used for training. To address this problem, we propose predictive perturbation-aware pixel-wise filtering, where dual-perturbation filtering and an uncertainty-aware fusion module are designed and employed to automatically perceive the perturbation amplitude during the training and testing process. The proposed method is termed as AdvFilter. Moreover, we combine adversarial pixel denoising methods with three adversarial training-based methods, hinting that considering data and models jointly is able to achieve more robust CNNs. The experiments conduct on NeurIPS-2017DEV, SVHN, and CIFAR10 datasets and show the advantages over enhancing CNNs robustness, high generalization to different models, and noise levels.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

Defending Against Image Corruptions Through Adversarial Augmentations

196 - Dan A. Calian , Florian Stimberg , Olivia Wiles 2021

Modern neural networks excel at image classification, yet they remain vulnerable to common image corruptions such as blur, speckle noise or fog. Recent methods that focus on this problem, such as AugMix and DeepAugment, introduce defenses that operat e in expectation over a distribution of image corruptions. In contrast, the literature on $ell_p$-norm bounded perturbations focuses on defenses against worst-case corruptions. In this work, we reconcile both approaches by proposing AdversarialAugment, a technique which optimizes the parameters of image-to-image models to generate adversarially corrupted augmented images. We theoretically motivate our method and give sufficient conditions for the consistency of its idealized version as well as that of DeepAugment. Our classifiers improve upon the state-of-the-art on common image corruption benchmarks conducted in expectation on CIFAR-10-C and improve worst-case performance against $ell_p$-norm bounded perturbations on both CIFAR-10 and ImageNet.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Scratch that! An Evolution-based Adversarial Attack against Neural Networks

123 - Malhar Jere , Loris Rossi , Briland Hitaj 2019

We study black-box adversarial attacks for image classifiers in a constrained threat model, where adversaries can only modify a small fraction of pixels in the form of scratches on an image. We show that it is possible for adversaries to generate loc alized textit{adversarial scratches} that cover less than $5%$ of the pixels in an image and achieve targeted success rates of $98.77%$ and $97.20%$ on ImageNet and CIFAR-10 trained ResNet-50 models, respectively. We demonstrate that our scratches are effective under diverse shapes, such as straight lines or parabolic Baezier curves, with single or multiple colors. In an extreme condition, in which our scratches are a single color, we obtain a targeted attack success rate of $66%$ on CIFAR-10 with an order of magnitude fewer queries than comparable attacks. We successfully launch our attack against Microsofts Cognitive Services Image Captioning API and propose various mitigation strategies.

الحوسبة العصبية والتطورية التعلم الآلي معالجة الصور والفيديو

Double Backpropagation for Training Autoencoders against Adversarial Attack

85 - Chengjin Sun , Sizhe Chen , 2020

Deep learning, as widely known, is vulnerable to adversarial samples. This paper focuses on the adversarial attack on autoencoders. Safety of the autoencoders (AEs) is important because they are widely used as a compression scheme for data storage an d transmission, however, the current autoencoders are easily attacked, i.e., one can slightly modify an input but has totally different codes. The vulnerability is rooted the sensitivity of the autoencoders and to enhance the robustness, we propose to adopt double backpropagation (DBP) to secure autoencoder such as VAE and DRAW. We restrict the gradient from the reconstruction image to the original one so that the autoencoder is not sensitive to trivial perturbation produced by the adversarial attack. After smoothing the gradient by DBP, we further smooth the label by Gaussian Mixture Model (GMM), aiming for accurate and robust classification. We demonstrate in MNIST, CelebA, SVHN that our method leads to a robust autoencoder resistant to attack and a robust classifier able for image transition and immune to adversarial attack if combined with GMM.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Occlusion resistant learning of intuitive physics from videos

81 - Ronan Riochet , Josef Sivic , Ivan Laptev 2020

To reach human performance on complex tasks, a key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation. This ability, often referred to as intuitive physics, has recently r eceived attention and several methods were proposed to learn these physical rules from video sequences. Yet, most of these methods are restricted to the case where no, or only limited, occlusions occur. In this work we propose a probabilistic formulation of learning intuitive physics in 3D scenes with significant inter-object occlusions. In our formulation, object positions are modeled as latent variables enabling the reconstruction of the scene. We then propose a series of approximations that make this problem tractable. Object proposals are linked across frames using a combination of a recurrent interaction network, modeling the physics in object space, and a compositional renderer, modeling the way in which objects project onto pixel space. We demonstrate significant improvements over state-of-the-art in the intuitive physics benchmark of IntPhys. We apply our method to a second dataset with increasing levels of occlusions, showing it realistically predicts segmentation masks up to 30 frames in the future. Finally, we also show results on predicting motion of objects in real videos.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو