Continuous-Time Reinforcement Learning: New Design Algorithms with Theoretical Insights and Performance Guarantees

191489-Thumbnail Image.png
Description
This dissertation discusses continuous-time reinforcement learning (CT-RL) for control of affine nonlinear systems. Continuous-time nonlinear optimal control problems hold great promise in real-world applications. After decades of development, reinforcement learning (RL) has achieved some of the greatest successes as a

This dissertation discusses continuous-time reinforcement learning (CT-RL) for control of affine nonlinear systems. Continuous-time nonlinear optimal control problems hold great promise in real-world applications. After decades of development, reinforcement learning (RL) has achieved some of the greatest successes as a general nonlinear control design method. Yet as RL control has developed, CT-RL results have greatly lagged their discrete-time RL (DT-RL) counterparts, especially in regards to real-world applications. Current CT-RL algorithms generally fall into two classes: adaptive dynamic programming (ADP), and actor-critic deep RL (DRL). The first school of ADP methods features elegant theoretical results stemming from adaptive and optimal control. Yet, they have not been shown effectively synthesizing meaningful controllers. The second school of DRL has shown impressive learning solutions, yet theoretical guarantees are still to be developed. A substantive analysis uncovering the quantitative causes of the fundamental gap between CT and DT remains to be conducted. Thus, this work develops a first-of-its kind quantitative evaluation framework to diagnose the performance limitations of the leading CT-RL methods. This dissertation also introduces a suite of new CT-RL algorithms which offers both theoretical and synthesis guarantees. The proposed design approach relies on three important factors. First, for physical systems that feature physically-motivated dynamical partitions into distinct loops, the proposed decentralization method breaks the optimal control problem into smaller subproblems. Second, the work introduces a new excitation framework to improve persistence of excitation (PE) and numerical conditioning via classical input/output insights. Third, the method scales the learning problem via design-motivated invertible transformations of the system state variables in order to modulate the algorithm learning regression for further increases in numerical stability. This dissertation introduces a suite of (decentralized) excitable integral reinforcement learning (EIRL) algorithms implementing these paradigms. It rigorously proves convergence, optimality, and closed-loop stability guarantees of the proposed methods, which are demonstrated in comprehensive comparative studies with the leading methods in ADP on a significant application problem of controlling an unstable, nonminimum phase hypersonic vehicle (HSV). It also conducts comprehensive comparative studies with the leading DRL methods on three state-of-the-art (SOTA) environments, revealing new performance/design insights.
Date Created
2024
Agent

Multiagent Optimization Problems: Bridging Practicality and Predictability

190995-Thumbnail Image.png
Description
This dissertation is an examination of collective systems of computationally limited agents that require coordination to achieve complex ensemble behaviors or goals. The design of coordination strategies can be framed as multiagent optimization problems, which are addressed in this work

This dissertation is an examination of collective systems of computationally limited agents that require coordination to achieve complex ensemble behaviors or goals. The design of coordination strategies can be framed as multiagent optimization problems, which are addressed in this work from both theoretical and practical perspectives. The primary foci of this study are models where computation is distributed over the agents themselves, which are assumed to possess onboard computational capabilities. There exist many assumption variants for distributed models, including fairness and concurrency properties. In general, there is a fundamental trade-off whereby weakening model assumptions increases the applicability of proposed solutions, while also increasing the difficulty of proving theoretical guarantees. This dissertation aims to produce a deeper understanding of this trade-off with respect to multiagent optimization and scalability in distributed settings. This study considers four multiagent optimization problems. The model assumptions begin with fully centralized computation for the all-or-nothing multicommodity flow problem, then progress to synchronous distributed models through examination of the unmapped multivehicle routing problem and the distributed target localization problem. The final model is again distributed but assumes an unfair asynchronous adversary in the context of the energy distribution problem for programmable matter. For these problems, a variety of algorithms are presented, each of which is grounded in a theoretical foundation that permits formal guarantees regarding correctness, running time, and other critical properties. These guarantees are then validated with in silico simulations and (in some cases) physical experiments, demonstrating empirically that they may carry over to the real world. Hence, this dissertation bridges a portion of the predictability-practicality gap with respect to multiagent optimization problems.
Date Created
2023
Agent