Refine
Year of publication
- 2022 (2) (remove)
Document Type
Conference Type
- Konferenzartikel (1)
Language
- English (2)
Has Fulltext
- no (2) (remove)
Is part of the Bibliography
- yes (2) (remove)
Keywords
- Adversarial examples (1)
- Machine Learning (1)
- adversarial (1)
- autoattack (1)
- detection (1)
- lid (1)
- mahalanobis (1)
- spectraldefense (1)
Institute
Open Access
- Bronze (2)
- Open Access (2)
Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks. However, current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system while being quasi-imperceptible to the human eye. In recent years, various approaches have been proposed to defend CNNs against such attacks, for example by model hardening or by adding explicit defence mechanisms. Thereby, a small “detector” is included in the network and trained on the binary classification task of distinguishing genuine data from data containing adversarial perturbations. In this work, we propose a simple and light-weight detector, which leverages recent findings on the relation between networks’ local intrinsic dimensionality (LID) and adversarial attacks. Based on a re-interpretation of the LID measure and several simple adaptations, we surpass the state-of-the-art on adversarial detection by a significant margin and reach almost perfect results in terms of F1-score for several networks and datasets. Sources available at: https://github.com/adverML/multiLID
Recently, RobustBench (Croce et al. 2020) has become a widely recognized benchmark for the adversarial robustness of image
classification networks. In it’s most commonly reported sub-task, RobustBench evaluates and ranks the adversarial robustness of trained neural networks on CIFAR10 under AutoAttack (Croce and Hein 2020b) with l∞ perturbations limited to ϵ = 8/255. With leading scores of the currently best performing models of around 60% of the baseline, it is fair to characterize this benchmark to be quite challenging. Despite it’s general acceptance in recent literature, we aim to foster discussion about the suitability of RobustBench as a key indicator for robustness which could be generalized to practical applications. Our line of argumentation against this is two-fold and supported by excessive experiments presented in this paper: We argue that I) the alternation of data by AutoAttack with l∞, ϵ = 8/255 is unrealistically strong, resulting in close to perfect detection rates of adversarial samples even by simple detection algorithms and human observers.
We also show that other attack methods are much harder to detect while achieving similar success rates. II) That results on low resolution data sets like CIFAR10 do not generalize well to higher resolution images as gradient based attacks appear to become even more detectable with increasing resolutions.