ترغب بنشر مسار تعليمي؟ اضغط هنا

One-pixel Signature: Characterizing CNN Models for Backdoor Detection

62   0   0.0 ( 0 )
 نشر من قبل Shanjiaoyang Huang
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We tackle the convolution neural networks (CNNs) backdoor detection problem by proposing a new representation called one-pixel signature. Our task is to detect/classify if a CNN model has been maliciously inserted with an unknown Trojan trigger or not. Here, each CNN model is associated with a signature that is created by generating, pixel-by-pixel, an adversarial value that is the result of the largest change to the class prediction. The one-pixel signature is agnostic to the design choice of CNN architectures, and how they were trained. It can be computed efficiently for a black-box CNN model without accessing the network parameters. Our proposed one-pixel signature demonstrates a substantial improvement (by around 30% in the absolute detection accuracy) over the existing competing methods for backdoored CNN detection/classification. One-pixel signature is a general representation that can be used to characterize CNN models beyond backdoor detection.



قيم البحث

اقرأ أيضاً

Computer vision and machine learning can be used to automate various tasks in cancer diagnostic and detection. If an attacker can manipulate the automated processing, the results can be devastating and in the worst case lead to wrong diagnosis and tr eatment. In this research, the goal is to demonstrate the use of one-pixel attacks in a real-life scenario with a real pathology dataset, TUPAC16, which consists of digitized whole-slide images. We attack against the IBM CODAITs MAX breast cancer detector using adversarial images. These adversarial examples are found using differential evolution to perform the one-pixel modification to the images in the dataset. The results indicate that a minor one-pixel modification of a whole slide image under analysis can affect the diagnosis by reversing the automatic diagnosis result. The attack poses a threat from the cyber security perspective: the one-pixel method can be used as an attack vector by a motivated attacker.
The phenomenon of Adversarial Examples is attracting increasing interest from the Machine Learning community, due to its significant impact to the security of Machine Learning systems. Adversarial examples are similar (from a perceptual notion of sim ilarity) to samples from the data distribution, that fool a machine learning classifier. For computer vision applications, these are images with carefully crafted but almost imperceptible changes, that are misclassified. In this work, we characterize this phenomenon under an existing taxonomy of threats to biometric systems, in particular identifying new attacks for Offline Handwritten Signature Verification systems. We conducted an extensive set of experiments on four widely used datasets: MCYT-75, CEDAR, GPDS-160 and the Brazilian PUC-PR, considering both a CNN-based system and a system using a handcrafted feature extractor (CLBP). We found that attacks that aim to get a genuine signature rejected are easy to generate, even in a limited knowledge scenario, where the attacker does not have access to the trained classifier nor the signatures used for training. Attacks that get a forgery to be accepted are harder to produce, and often require a higher level of noise - in most cases, no longer imperceptible as previous findings in object recognition. We also evaluated the impact of two countermeasures on the success rate of the attacks and the amount of noise required for generating successful attacks.
Last-generation GAN models allow to generate synthetic images which are visually indistinguishable from natural ones, raising the need to develop tools to distinguish fake and natural images thus contributing to preserve the trustworthiness of digita l images. While modern GAN models can generate very high-quality images with no visible spatial artifacts, reconstruction of consistent relationships among colour channels is expectedly more difficult. In this paper, we propose a method for distinguishing GAN-generated from natural images by exploiting inconsistencies among spectral bands, with specific focus on the generation of synthetic face images. Specifically, we use cross-band co-occurrence matrices, in addition to spatial co-occurrence matrices, as input to a CNN model, which is trained to distinguish between real and synthetic faces. The results of our experiments confirm the goodness of our approach which outperforms a similar detection technique based on intra-band spatial co-occurrences only. The performance gain is particularly significant with regard to robustness against post-processing, like geometric transformations, filtering and contrast manipulations.
319 - Alvin Chan , Yew-Soon Ong 2019
Deep learning models have recently shown to be vulnerable to backdoor poisoning, an insidious attack where the victim model predicts clean images correctly but classifies the same images as the target class when a trigger poison pattern is added. Thi s poison pattern can be embedded in the training dataset by the adversary. Existing defenses are effective under certain conditions such as a small size of the poison pattern, knowledge about the ratio of poisoned training samples or when a validated clean dataset is available. Since a defender may not have such prior knowledge or resources, we propose a defense against backdoor poisoning that is effective even when those prerequisites are not met. It is made up of several parts: one to extract a backdoor poison signal, detect poison target and base classes, and filter out poisoned from clean samples with proven guarantees. The final part of our defense involves retraining the poisoned model on a dataset augmented with the extracted poison signal and corrective relabeling of poisoned samples to neutralize the backdoor. Our approach has shown to be effective in defending against backdoor attacks that use both small and large-sized poison patterns on nine different target-base class pairs from the CIFAR10 dataset.
Most autonomous vehicles (AVs) rely on LiDAR and RGB camera sensors for perception. Using these point cloud and image data, perception models based on deep neural nets (DNNs) have achieved state-of-the-art performance in 3D detection. The vulnerabili ty of DNNs to adversarial attacks have been heavily investigated in the RGB image domain and more recently in the point cloud domain, but rarely in both domains simultaneously. Multi-modal perception systems used in AVs can be divided into two broad types: cascaded models which use each modality independently, and fusion models which learn from different modalities simultaneously. We propose a universal and physically realizable adversarial attack for each type, and study and contrast their respective vulnerabilities to attacks. We place a single adversarial object with specific shape and texture on top of a car with the objective of making this car evade detection. Evaluating on the popular KITTI benchmark, our adversarial object made the host vehicle escape detection by each model type nearly 50% of the time. The dense RGB input contributed more to the success of the adversarial attacks on both cascaded and fusion models. We found that the fusion model was relatively more robust to adversarial attacks than the cascaded model.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا