Provable Robust Learning Based on Transformation-Specific Smoothing


Abstract in English

As machine learning (ML) systems become pervasive, safeguarding their security is critical. Recent work has demonstrated that motivated adversaries could add adversarial perturbations to the test data to mislead ML systems. So far, most research has focused on providing provable robustness guarantees for ML models against a specific Lp norm bounded adversarial perturbation. However, in practice previous work has shown that there are other types of realistic adversarial transformations whose semantic meaning has been leveraged to attack ML systems. In this paper, we aim to provide a unified framework for certifying ML robustness against general adversarial transformations. First, we identify the semantic transformations as different categories: resolvable (e.g., Gaussian blur and brightness) and differentially resolvable transformations (e.g., rotation and scaling). We then provide sufficient conditions and strategies for certifying certain transformations. For instance, we propose a novel sampling-based interpolation approach with estimated Lipschitz upper bound to certify the robustness against differentially resolvable transformations. In addition, we theoretically optimize the smoothing strategies for certifying the robustness of ML models against different transformations. For instance, we show that smoothing by sampling from exponential distribution provides a tighter robustness bound than Gaussian. Extensive experiments on 7 semantic transformations show that our proposed unified framework significantly outperforms the state-of-the-art certified robustness approaches on several datasets including ImageNet.

Download