ترغب بنشر مسار تعليمي؟ اضغط هنا

Black-Box Dissector: Towards Erasing-based Hard-Label Model Stealing Attack

176   0   0.0 ( 0 )
 نشر من قبل Yixu Wang
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Previous studies have verified that the functionality of black-box models can be stolen with full probability outputs. However, under the more practical hard-label setting, we observe that existing methods suffer from catastrophic performance degradation. We argue this is due to the lack of rich information in the probability prediction and the overfitting caused by hard labels. To this end, we propose a novel hard-label model stealing method termed emph{black-box dissector}, which consists of two erasing-based modules. One is a CAM-driven erasing strategy that is designed to increase the information capacity hidden in hard labels from the victim model. The other is a random-erasing-based self-knowledge distillation module that utilizes soft labels from the substitute model to mitigate overfitting. Extensive experiments on four widely-used datasets consistently demonstrate that our method outperforms state-of-the-art methods, with an improvement of at most $8.27%$. We also validate the effectiveness and practical potential of our method on real-world APIs and defense methods. Furthermore, our method promotes other downstream tasks, emph{i.e.}, transfer adversarial attacks.

قيم البحث

اقرأ أيضاً

Nowadays, digital facial content manipulation has become ubiquitous and realistic with the success of generative adversarial networks (GANs), making face recognition (FR) systems suffer from unprecedented security concerns. In this paper, we investig ate and introduce a new type of adversarial attack to evade FR systems by manipulating facial content, called textbf{underline{a}dversarial underline{mor}phing underline{a}ttack} (a.k.a. Amora). In contrast to adversarial noise attack that perturbs pixel intensity values by adding human-imperceptible noise, our proposed adversarial morphing attack works at the semantic level that perturbs pixels spatially in a coherent manner. To tackle the black-box attack problem, we devise a simple yet effective joint dictionary learning pipeline to obtain a proprietary optical flow field for each attack. Our extensive evaluation on two popular FR systems demonstrates the effectiveness of our adversarial morphing attack at various levels of morphing intensity with smiling facial expression manipulations. Both open-set and closed-set experimental results indicate that a novel black-box adversarial attack based on local deformation is possible, and is vastly different from additive noise attacks. The findings of this work potentially pave a new research direction towards a more thorough understanding and investigation of image-based adversarial attacks and defenses.
107 - Huanqian Yan , Xingxing Wei , 2020
Black-box adversarial attacks on video recognition models have been explored. Considering the temporal interactions between frames, a few methods try to select some key frames, and then perform attacks on them. Unfortunately, their selecting strategy is independent with the attacking step, resulting in the limited performance. Instead, we argue the frame selection phase is closely relevant with the attacking phase. The key frames should be adjusted according to the attacking results. For that, we formulate the black-box video attacks into Reinforcement Learning (RL) framework. Specifically, the environment in RL is set as the threat model, and the agent in RL plays the role of frame selecting. By continuously querying the threat models and receiving the attacking feedback, the agent gradually adjusts its frame selection strategy and adversarial perturbations become smaller and smaller. A series of experiments demonstrate that our method can significantly reduce the adversarial perturbations with efficient query times.
Deep models have shown their vulnerability when processing adversarial samples. As for the black-box attack, without access to the architecture and weights of the attacked model, training a substitute model for adversarial attacks has attracted wide attention. Previous substitute training approaches focus on stealing the knowledge of the target model based on real training data or synthetic data, without exploring what kind of data can further improve the transferability between the substitute and target models. In this paper, we propose a novel perspective substitute training that focuses on designing the distribution of data used in the knowledge stealing process. More specifically, a diverse data generation module is proposed to synthesize large-scale data with wide distribution. And adversarial substitute training strategy is introduced to focus on the data distributed near the decision boundary. The combination of these two modules can further boost the consistency of the substitute model and target model, which greatly improves the effectiveness of adversarial attack. Extensive experiments demonstrate the efficacy of our method against state-of-the-art competitors under non-target and target attack settings. Detailed visualization and analysis are also provided to help understand the advantage of our method.
Machine-learning-as-a-service (MLaaS) has attracted millions of users to their outperforming sophisticated models. Although published as black-box APIs, the valuable models behind these services are still vulnerable to imitation attacks. Recently, a series of works have demonstrated that attackers manage to steal or extract the victim models. Nonetheless, none of the previous stolen models can outperform the original black-box APIs. In this work, we take the first step of showing that attackers could potentially surpass victims via unsupervised domain adaptation and multi-victim ensemble. Extensive experiments on benchmark datasets and real-world APIs validate that the imitators can succeed in outperforming the original black-box models. We consider this as a milestone in the research of imitation attack, especially on NLP APIs, as the superior performance could influence the defense or even publishing strategy of API providers.
We performed the first systematic study of a new attack on Ethereum that steals cryptocurrencies. The attack is due to the unprotected JSON-RPC endpoints existed in Ethereum nodes that could be exploited by attackers to transfer the Ether and ERC20 t okens to attackers-controlled accounts. This study aims to shed light on the attack, including malicious behaviors and profits of attackers. Specifically, we first designed and implemented a honeypot that could capture real attacks in the wild. We then deployed the honeypot and reported results of the collected data in a period of six months. In total, our system captured more than 308 million requests from 1,072 distinct IP addresses. We further grouped attackers into 36 groups with 59 distinct Ethereum accounts. Among them, attackers of 34 groups were stealing the Ether, while other 2 groups were targeting ERC20 tokens. The further behavior analysis showed that attackers were following a three-steps pattern to steal the Ether. Moreover, we observed an interesting type of transaction called zero gas transaction, which has been leveraged by attackers to steal ERC20 tokens. At last, we estimated the overall profits of attackers. To engage the whole community, the dataset of captured attacks is released on https://github.com/zjuicsr/eth-honey.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا