TUTCRIS - Tampereen teknillinen yliopisto

TUTCRIS

CIIDefence: Defeating Adversarial Attacks by Fusing Class-specific Image Inpainting and Image Denoising.

Tutkimustuotosvertaisarvioitu

Yksityiskohdat

AlkuperäiskieliEnglanti
Otsikko2019 International Conference on Computer Vision, ICCV 2019
KustantajaIEEE
Sivut6708-6717
ISBN (elektroninen)9781728148038
DOI - pysyväislinkit
TilaJulkaistu - 2019
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaIEEE/CVF International Conference on Computer Vision -
Kesto: 27 lokakuuta 20192 marraskuuta 2019

Julkaisusarja

NimiProceedings of the IEEE International Conference on Computer Vision
ISSN (painettu)1550-5499

Conference

ConferenceIEEE/CVF International Conference on Computer Vision
Ajanjakso27/10/192/11/19

Tiivistelmä

This paper presents a novel approach for protecting deep neural networks from adversarial attacks, i.e., methods that add well-crafted imperceptible modifications to the original inputs such that they are incorrectly classified with high confidence. The proposed defence mechanism is inspired by the recent works mitigating the adversarial disturbances by the means of image reconstruction and denoising. However, unlike the previous works, we apply the reconstruction only for small and carefully selected image areas that are most influential to the current classification outcome. The selection process is guided by the class activation map responses obtained for multiple top-ranking class labels. The same regions are also the most prominent for the adversarial perturbations and hence most important to purify. The resulting inpainting task is substantially more tractable than the full image reconstruction, while still being able to prevent the adversarial attacks. Furthermore, we combine the selective image inpainting with wavelet based image denoising to produce a non differentiable layer that prevents attacker from using gradient backpropagation. Moreover, the proposed nonlinearity cannot be easily approximated with simple differentiable alternative as demonstrated in the experiments with Backward Pass Differentiable Approximation (BPDA) attack. Finally, we experimentally show that the proposed Class-specific Image Inpainting Defence (CIIDefence) is able to withstand several powerful adversarial attacks including the BPDA. The obtained results are consistently better compared to the other recent defence approaches.

Julkaisufoorumi-taso