Vision-guided Policy Learning for Complex Tasks
Description
The field of computer vision has achieved tremendous progress over recent years with innovations in deep learning and neural networks. The advances have unprecedentedly enabled an intelligent agent to understand the world from its visual observations, such as recognizing an object, detecting the object's position, and estimating the distance to the object. It then comes to a question of how such visual understanding can be used to support the agent's decisions over its actions to perform a task. This dissertation aims to study this question in which several methods are presented to address the challenges in learning a desirable action policy from the agent's visual inputs for the agent to perform a task well. Specifically, this dissertation starts with learning an action policy from high dimensional visual observations by improving the sample efficiency. The improved sample efficiency is achieved through a denser reward function defined upon the visual understanding of the task, and an efficient exploration strategy equipped with a hierarchical policy. It further studies the generalizable action policy learning problem. The generalizability is achieved for both a fully observable task with local environment dynamic captured by visual representations, and a partially observable task with global environment dynamic captured by a novel graph representation. Finally, this dissertation explores learning from human-provided priors, such as natural language instructions and demonstration videos for better generalization ability.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
2021
Agent
- Author (aut): Ye, Xin
- Thesis advisor (ths): Yang, Yezhou YY
- Committee member: Ren, Yi YR
- Committee member: Pavlic, Theodore TP
- Committee member: Fan, Deliang DF
- Committee member: Srivastava, Siddharth SS
- Publisher (pbl): Arizona State University