Integrating Adversarial Training, Noise Injection, and Mixup into XAI: Pathways to Enhancing Data Efficiency and Generalizability

Park, Keun Hee

Rapid advancements in artificial intelligence (AI) have revolutionized various do- mains, enabling the development of sophisticated models capable of solving complex problems. However, as AI systems increasingly participate in critical decision-making processes, concerns about their interpretability, robustness, and reliability have…

Rapid advancements in artificial intelligence (AI) have revolutionized various do- mains, enabling the development of sophisticated models capable of solving complex problems. However, as AI systems increasingly participate in critical decision-making processes, concerns about their interpretability, robustness, and reliability have in- tensified. Interpretable AI models, such as the Concept-Centric Transformer (CCT), have emerged as promising solutions to enhance transparency in AI models. Yet, in- creasing model interpretability often requires enriching training data with concept ex- planations, escalating training costs. Therefore, intrinsically interpretable models like CCT must be designed to be data-efficient, generalizable—to accommodate smaller training sets—and robust against noise and adversarial attacks. Despite progress in interpretable AI, ensuring the robustness of these models remains a challenge.This thesis enhances the data efficiency and generalizability of the CCT model by integrating four techniques: Perturbation Random Masking (PRM), Attention Random Dropout (ARD), and the integration of manifold mixup and input mixup for memory broadcast. Comprehensive experiments on benchmark datasets such as CIFAR-100, CUB-200-2011, and ImageNet show that the enhanced CCT model achieves modest performance improvements over the original model when using a full training set. Furthermore, this performance gap increases as the training data volume decreases, particularly in few-shot learning scenarios. The enhanced CCT maintains high accuracy with limited data (even without explicitly training on ex- ample concept-level explanations), demonstrating its potential for real-world appli- cations where labeled data are scarce. These findings suggest that the enhancements enable more effective use of CCT in settings with data constraints. Ablation studies reveal that no single technique—PRM, ARD, or mixups—dominates in enhancing performance and data efficiency. Each contributes nearly equally, and their combined application yields the best results, indicating a synergistic effect that bolsters the model’s capabilities without any single method being predominant. The results of this research highlight the efficacy of the proposed enhancements in refining CCT models for greater performance, robustness, and data efficiency. By demonstrating improved performance and resilience, particularly in data-limited sce- narios, this thesis underscores the practical applicability of advanced AI systems in critical decision-making roles.

Copyright Statement