No Arabic abstract
Training deep neural networks from scratch could be computationally expensive and requires a lot of training data. Recent work has explored different watermarking techniques to protect the pre-trained deep neural networks from potential copyright infringements. However, these techniques could be vulnerable to watermark removal attacks. In this work, we propose REFIT, a unified watermark removal framework based on fine-tuning, which does not rely on the knowledge of the watermarks, and is effective against a wide range of watermarking schemes. In particular, we conduct a comprehensive study of a realistic attack scenario where the adversary has limited training data, which has not been emphasized in prior work on attacks against watermarking schemes. To effectively remove the watermarks without compromising the model functionality under this weak threat model, we propose two techniques that are incorporated into our fine-tuning framework: (1) an adaption of the elastic weight consolidation (EWC) algorithm, which is originally proposed for mitigating the catastrophic forgetting phenomenon; and (2) unlabeled data augmentation (AU), where we leverage auxiliary unlabeled data from other sources. Our extensive evaluation shows the effectiveness of REFIT against diverse watermark embedding schemes. In particular, both EWC and AU significantly decrease the amount of labeled training data needed for effective watermark removal, and the unlabeled data samples used for AU do not necessarily need to be drawn from the same distribution as the benign data for model evaluation. The experimental results demonstrate that our fine-tuning based watermark removal attacks could pose real threats to the copyright of pre-trained models, and thus highlight the importance of further investigating the watermarking problem and proposing more robust watermark embedding schemes against the attacks.
Recent research has demonstrated that adding some imperceptible perturbations to original images can fool deep learning models. However, the current adversarial perturbations are usually shown in the form of noises, and thus have no practical meaning. Image watermark is a technique widely used for copyright protection. We can regard image watermark as a king of meaningful noises and adding it to the original image will not affect peoples understanding of the image content, and will not arouse peoples suspicion. Therefore, it will be interesting to generate adversarial examples using watermarks. In this paper, we propose a novel watermark perturbation for adversarial examples (Adv-watermark) which combines image watermarking techniques and adversarial example algorithms. Adding a meaningful watermark to the clean images can attack the DNN models. Specifically, we propose a novel optimization algorithm, which is called Basin Hopping Evolution (BHE), to generate adversarial watermarks in the black-box attack mode. Thanks to the BHE, Adv-watermark only requires a few queries from the threat models to finish the attacks. A series of experiments conducted on ImageNet and CASIA-WebFace datasets show that the proposed method can efficiently generate adversarial examples, and outperforms the state-of-the-art attack methods. Moreover, Adv-watermark is more robust against image transformation defense methods.
Although cyberattacks on machine learning (ML) production systems can be destructive, many industry practitioners are ill equipped, lacking tactical and strategic tools that would allow them to analyze, detect, protect against, and respond to cyberattacks targeting their ML-based systems. In this paper, we take a significant step toward securing ML production systems by integrating these systems and their vulnerabilities into cybersecurity risk assessment frameworks. Specifically, we performed a comprehensive threat analysis of ML production systems and developed an extension to the MulVAL attack graph generation and analysis framework to incorporate cyberattacks on ML production systems. Using the proposed extension, security practitioners can apply attack graph analysis methods in environments that include ML components, thus providing security experts with a practical tool for evaluating the impact and quantifying the risk of a cyberattack targeting an ML production system.
The existence of noisy data is prevalent in both the training and testing phases of machine learning systems, which inevitably leads to the degradation of model performance. There have been plenty of works concentrated on learning with in-distribution (IND) noisy labels in the last decade, i.e., some training samples are assigned incorrect labels that do not correspond to their true classes. Nonetheless, in real application scenarios, it is necessary to consider the influence of out-of-distribution (OOD) samples, i.e., samples that do not belong to any known classes, which has not been sufficiently explored yet. To remedy this, we study a new problem setup, namely Learning with Open-world Noisy Data (LOND). The goal of LOND is to simultaneously learn a classifier and an OOD detector from datasets with mixed IND and OOD noise. In this paper, we propose a new graph-based framework, namely Noisy Graph Cleaning (NGC), which collects clean samples by leveraging geometric structure of data and model predictive confidence. Without any additional training effort, NGC can detect and reject the OOD samples based on the learned class prototypes directly in testing phase. We conduct experiments on multiple benchmarks with different types of noise and the results demonstrate the superior performance of our method against state of the arts.
We seek to remove foreground contaminants from 21cm intensity mapping observations. We demonstrate that a deep convolutional neural network (CNN) with a UNet architecture and three-dimensional convolutions, trained on simulated observations, can effectively separate frequency and spatial patterns of the cosmic neutral hydrogen (HI) signal from foregrounds in the presence of noise. Cleaned maps recover cosmological clustering statistics within 10% at all relevant angular scales and frequencies. This amounts to a reduction in prediction variance of over an order of magnitude on small angular scales ($ell > 300$), and improved accuracy for small radial scales ($k_{parallel} > 0.17 rm h Mpc^{-1})$ compared to standard Principal Component Analysis (PCA) methods. We estimate posterior confidence intervals for the networks prediction by training an ensemble of UNets. Our approach demonstrates the feasibility of analyzing 21cm intensity maps, as opposed to derived summary statistics, for upcoming radio experiments, as long as the simulated foreground model is sufficiently realistic. We provide the code used for this analysis on Github https://github.com/tlmakinen/deep21 as well as a browser-based tutorial for the experiment and UNet model via the accompanying http://bit.ly/deep21-colab Colab notebook.
Deep learning techniques have made tremendous progress in a variety of challenging tasks, such as image recognition and machine translation, during the past decade. Training deep neural networks is computationally expensive and requires both human and intellectual resources. Therefore, it is necessary to protect the intellectual property of the model and externally verify the ownership of the model. However, previous studies either fail to defend against the evasion attack or have not explicitly dealt with fraudulent claims of ownership by adversaries. Furthermore, they can not establish a clear association between the model and the creators identity. To fill these gaps, in this paper, we propose a novel intellectual property protection (IPP) framework based on blind-watermark for watermarking deep neural networks that meet the requirements of security and feasibility. Our framework accepts ordinary samples and the exclusive logo as inputs, outputting newly generated samples as watermarks, which are almost indistinguishable from the origin, and infuses these watermarks into DNN models by assigning specific labels, leaving the backdoor as the basis for our copyright claim. We evaluated our IPP framework on two benchmark datasets and 15 popular deep learning models. The results show that our framework successfully verifies the ownership of all the models without a noticeable impact on their primary task. Most importantly, we are the first to successfully design and implement a blind-watermark based framework, which can achieve state-of-art performances on undetectability against evasion attack and unforgeability against fraudulent claims of ownership. Further, our framework shows remarkable robustness and establishes a clear association between the model and the authors identity.