Learning Policies for Model-Based Reinforcement Learning Using Distributed Reward Formulation
Document
Description
This work explores combining state-of-the-art \gls{mbrl} algorithms focused on learning complex policies with large state-spaces and augmenting them with distributional reward perspective on \gls{rl} algorithms. Distributional \gls{rl} provides a probabilistic reward formulation as opposed to the classic \gls{rl} formulation which models the estimation of this distributional return. These probabilistic reward formulations help the agent choose highly risk-averse actions, which in turn makes the learning more stable. To evaluate this idea, I experiment in simulation on complex high-dimensional environments when subject under different noisy conditions.