Adversarial training has been widely acknowledged as the most effective defense against adversarial attacks. However, recent research has demonstrated that a large discrepancy exists in the class-wise robustness of adversarial training, which could cause two potential issues: 1) buckets effect, where the least robust class can be exploited as a serious loophole, and 2) ethical issues, where some groups are protected with lower robustness. Therefore, unfair robustness is a critical issue that can severely hinder the practical application of adversarial training.
To address this challenge, a research team led by Qian Wang published their new research on 15 Mar 2025 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.
The team proposed a novel fair adversarial training algorithm (FairAT), which aims to improve the robustness of hard classes and can be treated as an improvement of vanilla adversarial training. The proposed FairAT leverages the discovery that the uncertainty of robust models about examples is highly consistent with the class-wise robustness, which is used as a more fine-grained indicator of robust fairness. By dynamically augmenting hard examples throughout the training process, it outperforms state-of-the-art methods in terms of both overall robustness and fairness.
Future work can focus on designing more advanced methods to boost fairness, reducing the training cost of FairAT, and extending existing methods to other important tasks and modalities.
DOI: 10.1007/s11704-024-3587-1