No Arabic abstract
We investigate how an adversary can optimally use its query budget for targeted evasion attacks against deep neural networks in a black-box setting. We formalize the problem setting and systematically evaluate what benefits the adversary can gain by using substitute models. We show that there is an exploration-exploitation tradeoff in that query efficiency comes at the cost of effectiveness. We present two new attack strategies for using substitute models and show that they are as effective as previous query-only techniques but require significantly fewer queries, by up to three orders of magnitude. We also show that an agile adversary capable of switching through different attack techniques can achieve pareto-optimal efficiency. We demonstrate our attack against Google Cloud Vision showing that the difficulty of black-box attacks against real-world prediction APIs is significantly easier than previously thought (requiring approximately 500 queries instead of approximately 20,000 as in previous works).
Deep neural networks (DNNs) have demonstrated impressive performance on many challenging machine learning tasks. However, DNNs are vulnerable to adversarial inputs generated by adding maliciously crafted perturbations to the benign inputs. As a growing number of attacks have been reported to generate adversarial inputs of varying sophistication, the defense-attack arms race has been accelerated. In this paper, we present MODEF, a cross-layer model diversity ensemble framework. MODEF intelligently combines unsupervised model denoising ensemble with supervised model verification ensemble by quantifying model diversity, aiming to boost the robustness of the target model against adversarial examples. Evaluated using eleven representative attacks on popular benchmark datasets, we show that MODEF achieves remarkable defense success rates, compared with existing defense methods, and provides a superior capability of repairing adversarial inputs and making correct predictions with high accuracy in the presence of black-box attacks.
Transfer learning is a useful machine learning framework that allows one to build task-specific models (student models) without significantly incurring training costs using a single powerful model (teacher model) pre-trained with a large amount of data. The teacher model may contain private data, or interact with private inputs. We investigate if one can leak or infer such private information without interacting with the teacher model directly. We describe such inference attacks in the context of face recognition, an application of transfer learning that is highly sensitive to personal privacy. Under black-box and realistic settings, we show that existing inference techniques are ineffective, as interacting with individual training instances through the student models does not reveal information about the teacher. We then propose novel strategies to infer from aggregate-level information. Consequently, membership inference attacks on the teacher model are shown to be possible, even when the adversary has access only to the student models. We further demonstrate that sensitive attributes can be inferred, even in the case where the adversary has limited auxiliary information. Finally, defensive strategies are discussed and evaluated. Our extensive study indicates that information leakage is a real privacy threat to the transfer learning framework widely used in real-life situations.
Deep neural networks (DNNs) are known for their vulnerability to adversarial examples. These are examples that have undergone small, carefully crafted perturbations, and which can easily fool a DNN into making misclassifications at test time. Thus far, the field of adversarial research has mainly focused on image models, under either a white-box setting, where an adversary has full access to model parameters, or a black-box setting where an adversary can only query the target model for probabilities or labels. Whilst several white-box attacks have been proposed for video models, black-box video attacks are still unexplored. To close this gap, we propose the first black-box video attack framework, called V-BAD. V-BAD utilizes tentative perturbations transferred from image models, and partition-based rectifications found by the NES on partitions (patches) of tentative perturbations, to obtain good adversarial gradient estimates with fewer queries to the target model. V-BAD is equivalent to estimating the projection of an adversarial gradient on a selected subspace. Using three benchmark video datasets, we demonstrate that V-BAD can craft both untargeted and targeted attacks to fool two state-of-the-art deep video recognition models. For the targeted attack, it achieves $>$93% success rate using only an average of $3.4 sim 8.4 times 10^4$ queries, a similar number of queries to state-of-the-art black-box image attacks. This is despite the fact that videos often have two orders of magnitude higher dimensionality than static images. We believe that V-BAD is a promising new tool to evaluate and improve the robustness of video recognition models to black-box adversarial attacks.
Face recognition has obtained remarkable progress in recent years due to the great improvement of deep convolutional neural networks (CNNs). However, deep CNNs are vulnerable to adversarial examples, which can cause fateful consequences in real-world face recognition applications with security-sensitive purposes. Adversarial attacks are widely studied as they can identify the vulnerability of the models before they are deployed. In this paper, we evaluate the robustness of state-of-the-art face recognition models in the decision-based black-box attack setting, where the attackers have no access to the model parameters and gradients, but can only acquire hard-label predictions by sending queries to the target model. This attack setting is more practical in real-world face recognition systems. To improve the efficiency of previous methods, we propose an evolutionary attack algorithm, which can model the local geometries of the search directions and reduce the dimension of the search space. Extensive experiments demonstrate the effectiveness of the proposed method that induces a minimum perturbation to an input face image with fewer queries. We also apply the proposed method to attack a real-world face recognition system successfully.
We propose a simple and highly query-efficient black-box adversarial attack named SWITCH, which has a state-of-the-art performance in the score-based setting. SWITCH features a highly efficient and effective utilization of the gradient of a surrogate model $hat{mathbf{g}}$ w.r.t. the input image, i.e., the transferable gradient. In each iteration, SWITCH first tries to update the current sample along the direction of $hat{mathbf{g}}$, but considers switching to its opposite direction $-hat{mathbf{g}}$ if our algorithm detects that it does not increase the value of the attack objective function. We justify the choice of switching to the opposite direction by a local approximate linearity assumption. In SWITCH, only one or two queries are needed per iteration, but it is still effective due to the rich information provided by the transferable gradient, thereby resulting in unprecedented query efficiency. To improve the robustness of SWITCH, we further propose SWITCH$_text{RGF}$ in which the update follows the direction of a random gradient-free (RGF) estimate when neither $hat{mathbf{g}}$ nor its opposite direction can increase the objective, while maintaining the advantage of SWITCH in terms of query efficiency. Experimental results conducted on CIFAR-10, CIFAR-100 and TinyImageNet show that compared with other methods, SWITCH achieves a satisfactory attack success rate using much fewer queries, and SWITCH$_text{RGF}$ achieves the state-of-the-art attack success rate with fewer queries overall. Our approach can serve as a strong baseline for future black-box attacks because of its simplicity. The PyTorch source code is released on https://github.com/machanic/SWITCH.