ترغب بنشر مسار تعليمي؟ اضغط هنا

Universal Decision-Based Black-Box Perturbations: Breaking Security-Through-Obscurity Defenses

219   0   0.0 ( 0 )
 نشر من قبل Bhavya Kailkhura
 تاريخ النشر 2018
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We study the problem of finding a universal (image-agnostic) perturbation to fool machine learning (ML) classifiers (e.g., neural nets, decision tress) in the hard-label black-box setting. Recent work in adversarial ML in the white-box setting (model parameters are known) has shown that many state-of-the-art image classifiers are vulnerable to universal adversarial perturbations: a fixed human-imperceptible perturbation that, when added to any image, causes it to be misclassified with high probability Kurakin et al. [2016], Szegedy et al. [2013], Chen et al. [2017a], Carlini and Wagner [2017]. This paper considers a more practical and challenging problem of finding such universal perturbations in an obscure (or black-box) setting. More specifically, we use zeroth order optimization algorithms to find such a universal adversarial perturbation when no model information is revealed-except that the attacker can make queries to probe the classifier. We further relax the assumption that the output of a query is continuous valued confidence scores for all the classes and consider the case where the output is a hard-label decision. Surprisingly, we found that even in these extremely obscure regimes, state-of-the-art ML classifiers can be fooled with a very high probability just by adding a single human-imperceptible image perturbation to any natural image. The surprising existence of universal perturbations in a hard-label black-box setting raises serious security concerns with the existence of a universal noise vector that adversaries can possibly exploit to break a classifier on most natural images.


قيم البحث

اقرأ أيضاً

As machine learning systems grow in scale, so do their training data requirements, forcing practitioners to automate and outsource the curation of training data in order to achieve state-of-the-art performance. The absence of trustworthy human superv ision over the data collection process exposes organizations to security vulnerabilities; training data can be manipulated to control and degrade the downstream behaviors of learned models. The goal of this work is to systematically categorize and discuss a wide range of dataset vulnerabilities and exploits, approaches for defending against these threats, and an array of open problems in this space. In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.
Universal Adversarial Perturbations (UAPs) are input perturbations that can fool a neural network on large sets of data. They are a class of attacks that represents a significant threat as they facilitate realistic, practical, and low-cost attacks on neural networks. In this work, we derive upper bounds for the effectiveness of UAPs based on norms of data-dependent Jacobians. We empirically verify that Jacobian regularization greatly increases model robustness to UAPs by up to four times whilst maintaining clean performance. Our theoretical analysis also allows us to formulate a metric for the strength of shared adversarial perturbations between pairs of inputs. We apply this metric to benchmark datasets and show that it is highly correlated with the actual observed robustness. This suggests that realistic and practical universal attacks can be reliably mitigated without sacrificing clean accuracy, which shows promise for the robustness of machine learning systems.
The pervasive application of algorithmic decision-making is raising concerns on the risk of unintended bias in AI systems deployed in critical settings such as healthcare. The detection and mitigation of biased models is a very delicate task which sh ould be tackled with care and involving domain experts in the loop. In this paper we introduce FairLens, a methodology for discovering and explaining biases. We show how our tool can be used to audit a fictional commercial black-box model acting as a clinical decision support system. In this scenario, the healthcare facility experts can use FairLens on their own historical data to discover the models biases before incorporating it into the clinical decision flow. FairLens first stratifies the available patient data according to attributes such as age, ethnicity, gender and insurance; it then assesses the model performance on such subgroups of patients identifying those in need of expert evaluation. Finally, building on recent state-of-the-art XAI (eXplainable Artificial Intelligence) techniques, FairLens explains which elements in patients clinical history drive the model error in the selected subgroup. Therefore, FairLens allows experts to investigate whether to trust the model and to spotlight group-specific biases that might constitute potential fairness issues.
Deep neural networks are vulnerable to adversarial examples, even in the black-box setting, where the attacker is restricted solely to query access. Existing black-box approaches to generating adversarial examples typically require a significant numb er of queries, either for training a substitute network or performing gradient estimation. We introduce GenAttack, a gradient-free optimization technique that uses genetic algorithms for synthesizing adversarial examples in the black-box setting. Our experiments on different datasets (MNIST, CIFAR-10, and ImageNet) show that GenAttack can successfully generate visually imperceptible adversarial examples against state-of-the-art image recognition models with orders of magnitude fewer queries than previous approaches. Against MNIST and CIFAR-10 models, GenAttack required roughly 2,126 and 2,568 times fewer queries respectively, than ZOO, the prior state-of-the-art black-box attack. In order to scale up the attack to large-scale high-dimensional ImageNet models, we perform a series of optimizations that further improve the query efficiency of our attack leading to 237 times fewer queries against the Inception-v3 model than ZOO. Furthermore, we show that GenAttack can successfully attack some state-of-the-art ImageNet defenses, including ensemble adversarial training and non-differentiable or randomized input transformations. Our results suggest that evolutionary algorithms open up a promising area of research into effective black-box attacks.
Deep learning-based time series models are being extensively utilized in engineering and manufacturing industries for process control and optimization, asset monitoring, diagnostic and predictive maintenance. These models have shown great improvement in the prediction of the remaining useful life (RUL) of industrial equipment but suffer from inherent vulnerability to adversarial attacks. These attacks can be easily exploited and can lead to catastrophic failure of critical industrial equipment. In general, different adversarial perturbations are computed for each instance of the input data. This is, however, difficult for the attacker to achieve in real time due to higher computational requirement and lack of uninterrupted access to the input data. Hence, we present the concept of universal adversarial perturbation, a special imperceptible noise to fool regression based RUL prediction models. Attackers can easily utilize universal adversarial perturbations for real-time attack since continuous access to input data and repetitive computation of adversarial perturbations are not a prerequisite for the same. We evaluate the effect of universal adversarial attacks using NASA turbofan engine dataset. We show that addition of universal adversarial perturbation to any instance of the input data increases error in the output predicted by the model. To the best of our knowledge, we are the first to study the effect of the universal adversarial perturbation on time series regression models. We further demonstrate the effect of varying the strength of perturbations on RUL prediction models and found that model accuracy decreases with the increase in perturbation strength of the universal adversarial attack. We also showcase that universal adversarial perturbation can be transferred across different models.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا