Learning Interpretable Action Models of Simulated Agents Through Agent Interrogation
Document
Description
Understanding the limits and capabilities of an AI system is essential for safe and effective usability of modern AI systems. In the query-based AI assessment paradigm, a personalized assessment module queries a black-box AI system on behalf of a user and returns a user-interpretable model of the AI system’s capabilities. This thesis develops this paradigm to learn interpretable action models of simulator-based agents. Two types of agents are considered: the first uses high-level actions where the user’s vocabulary captures the simulator state perfectly, and the second operates on low-level actions where the user’s vocabulary captures only an abstraction of the simulator state. Methods are developed to interface the assessment module with these agents. Empirical results show that this method is capable of learning interpretable models of agents operating in a range of domains.