Details
We introduce a novel framework for analyzing reinforcement learning (RL) in continuous state-action spaces, and use it to prove fast rates of convergence in both off-line and on-line settings. Our analysis highlights two key stability properties, relating to how changes in value functions and/or policies affect the Bellman operator and occupation measures. We argue that these properties are satisfied in many continuous state-action Markov decision processes, and demonstrate how they arise naturally when using linear function approximation methods. Our analysis offers fresh perspectives on the roles of pessimism and optimism in off-line and on-line RL, and highlights the connection between off-line RL and transfer learning.
Yaqi Duan is an Assistant Professor in the Department of Technology, Operations, and Statistics at the Stern School of Business, New York University. She received her Ph.D. in Operations Research and Financial Engineering from Princeton University in 2022. Duan’s research interests are at the intersection of statistics, machine learning, and operations research, focusing on developing new statistical methodologies and theories to tackle challenges in data-driven decision-making. Following her Ph.D. and before joining NYU Stern, Duan was a postdoctoral researcher at MIT.