A Study of the Effects of Transfer Learning on Adversarial Robustness

Pratik Vaishnavi · Kevin Eykholt · Amir Rahmati


Paper PDF

Thumbnail of paper pages


The security and robustness of AI systems are paramount in real-world applications. Previous research has focused on developing methods to train robust networks, assuming the availability of sufficient labeled training data. However, in deployment scenarios with limited training data, existing techniques for training robust networks become impractical. In such low-data scenarios, non-robust training methods often resort to transfer learning. This involves pre-training a network on a large, possibly labeled dataset and fine-tuning it for a new task with a limited set of training samples. The efficacy of transfer learning in enhancing adversarial robustness is not comprehensively explored. Specifically, it remains uncertain whether transfer learning can improve adversarial performance in low-data scenarios. Furthermore, the potential benefits of transfer learning for certified robustness are unexplored. In this paper, we conduct an extensive analysis of the impact of transfer learning on both empirical and certified adversarial robustness. Employing supervised and self-supervised pre-training methods and fine-tuning across 12 downstream tasks representing diverse data availability scenarios, we identify the conditions conducive to training adversarially robust models through transfer learning. Our study reveals that the effectiveness of transfer learning in improving adversarial robustness is attributed to an increase in standard accuracy and not the direct ``transfer'' of robustness from the source to the target task, contrary to previous beliefs. Our findings provide valuable insights for practitioners aiming to deploy robust ML models in their applications.