ترغب بنشر مسار تعليمي؟ اضغط هنا

Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer Learning

69   0   0.0 ( 0 )
 نشر من قبل Yinghua Zhang
 تاريخ النشر 2020
والبحث باللغة English




اسأل ChatGPT حول البحث

Transfer learning has become a common practice for training deep learning models with limited labeled data in a target domain. On the other hand, deep models are vulnerable to adversarial attacks. Though transfer learning has been widely applied, its effect on model robustness is unclear. To figure out this problem, we conduct extensive empirical evaluations to show that fine-tuning effectively enhances model robustness under white-box FGSM attacks. We also propose a black-box attack method for transfer learning models which attacks the target model with the adversarial examples produced by its source model. To systematically measure the effect of both white-box and black-box attacks, we propose a new metric to evaluate how transferable are the adversarial examples produced by a source model to a target model. Empirical results show that the adversarial examples are more transferable when fine-tuning is used than they are when the two networks are trained independently.



قيم البحث

اقرأ أيضاً

143 - Amir Nazemi , Paul Fieguth 2019
Deep convolutional neural networks can be highly vulnerable to small perturbations of their inputs, potentially a major issue or limitation on system robustness when using deep networks as classifiers. In this paper we propose a low-cost method to ex plore marginal sample data near trained classifier decision boundaries, thus identifying potential adversarial samples. By finding such adversarial samples it is possible to reduce the search space of adversarial attack algorithms while keeping a reasonable successful perturbation rate. In our developed strategy, the potential adversarial samples represent only 61% of the test data, but in fact cover more than 82% of the adversarial samples produced by iFGSM and 92% of the adversarial samples successfully perturbed by DeepFool on CIFAR10.
Transfer learning is a useful machine learning framework that allows one to build task-specific models (student models) without significantly incurring training costs using a single powerful model (teacher model) pre-trained with a large amount of da ta. The teacher model may contain private data, or interact with private inputs. We investigate if one can leak or infer such private information without interacting with the teacher model directly. We describe such inference attacks in the context of face recognition, an application of transfer learning that is highly sensitive to personal privacy. Under black-box and realistic settings, we show that existing inference techniques are ineffective, as interacting with individual training instances through the student models does not reveal information about the teacher. We then propose novel strategies to infer from aggregate-level information. Consequently, membership inference attacks on the teacher model are shown to be possible, even when the adversary has access only to the student models. We further demonstrate that sensitive attributes can be inferred, even in the case where the adversary has limited auxiliary information. Finally, defensive strategies are discussed and evaluated. Our extensive study indicates that information leakage is a real privacy threat to the transfer learning framework widely used in real-life situations.
Deep neural networks (DNNs) are playing key roles in various artificial intelligence applications such as image classification and object recognition. However, a growing number of studies have shown that there exist adversarial examples in DNNs, whic h are almost imperceptibly different from original samples, but can greatly change the network output. Existing white-box attack algorithms can generate powerful adversarial examples. Nevertheless, most of the algorithms concentrate on how to iteratively make the best use of gradients to improve adversarial performance. In contrast, in this paper, we focus on the properties of the widely-used ReLU activation function, and discover that there exist two phenomena (i.e., wrong blocking and over transmission) misleading the calculation of gradients in ReLU during the backpropagation. Both issues enlarge the difference between the predicted changes of the loss function from gradient and corresponding actual changes, and mislead the gradients which results in larger perturbations. Therefore, we propose a universal adversarial example generation method, called ADV-ReLU, to enhance the performance of gradient based white-box attack algorithms. During the backpropagation of the network, our approach calculates the gradient of the loss function versus network input, maps the values to scores, and selects a part of them to update the misleading gradients. Comprehensive experimental results on emph{ImageNet} demonstrate that our ADV-ReLU can be easily integrated into many state-of-the-art gradient-based white-box attack algorithms, as well as transferred to black-box attack attackers, to further decrease perturbations in the ${ell _2}$-norm.
In general, adversarial perturbations superimposed on inputs are realistic threats for a deep neural network (DNN). In this paper, we propose a practical generation method of such adversarial perturbation to be applied to black-box attacks that deman d access to an input-output relationship only. Thus, the attackers generate such perturbation without invoking inner functions and/or accessing the inner states of a DNN. Unlike the earlier studies, the algorithm to generate the perturbation presented in this study requires much fewer query trials. Moreover, to show the effectiveness of the adversarial perturbation extracted, we experiment with a DNN for semantic segmentation. The result shows that the network is easily deceived with the perturbation generated than using uniformly distributed random noise with the same magnitude.
Most graph convolutional neural networks (GCNs) perform poorly in graphs where neighbors typically have different features/classes (heterophily) and when stacking multiple layers (oversmoothing). These two seemingly unrelated problems have been studi ed independently, but there is recent empirical evidence that solving one problem may benefit the other. In this work, going beyond empirical observations, we aim to: (1) propose a new perspective to analyze the heterophily and oversmoothing problems under a unified theoretical framework, (2) identify the common causes of the two problems based on the proposed framework, and (3) propose simple yet effective strategies that address the common causes. Focusing on the node classification task, we use linear separability of node representations as an indicator to reflect the performance of GCNs and we propose to study the linear separability by analyzing the statistical change of the node representations in the graph convolution. We find that the relative degree of a node (compared to its neighbors) and the heterophily level of a nodes neighborhood are the root causes that influence the separability of node representations. Our analysis suggests that: (1) Nodes with high heterophily always produce less separable representations after graph convolution; (2) Even with low heterophily, degree disparity between nodes can influence the network dynamics and result in a pseudo-heterophily situation, which helps to explain oversmoothing. Based on our insights, we propose simple modifications to the GCN architecture -- i.e., degree corrections and signed messages -- which alleviate the root causes of these issues, and also show this empirically on 9 real networks. Compared to other approaches, which tend to work well in one regime but fail in others, our modified GCN model consistently performs well across all settings.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا