We address the challenge of exploration in reinforcement learning (RL) when the agent operates in an unknown environment with sparse or no rewards. In this work, we study the maximum entropy exploration problem of two different types. The first type is visitation entropy maximization previously considered by Hazan et al. (2019) in the discounted setting. For this type of exploration, we propose a game-theoretic algorithm that has O(H^3S^2A/ε^2) sample complexity thus improving the ε-dependence upon existing results, where S is a number of states, A is a number of actions, H is an episode length, and ε is a desired accuracy. The second type of entropy we study is the trajectory entropy. This objective function is closely related to the entropy-regularized MDPs, and we propose a simple algorithm that has a sample complexity of order O(poly(S, A, H)/ε). Interestingly, it is the first theoretical result in RL literature that establishes the potential statistical advantage of regularized MDPs for exploration. Finally, we apply developed regularization techniques to reduce sample complexity of visitation entropy maximization to O(H^2SA/ε^2), yielding a statistical separation between maximum entropy exploration and reward-free exploration. The talk is based on the joint work with Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Menard
Post Talk Link: ClickHere
Passcode: +$AtX0W?
Alexey Naumov is a professor and a head of the International laboratory of stochastic algorithms and high-dimensional inference of the National Research University Higher School of Economics. He graduated from the Lomonosov Moscow State University (MSU) in 2010, received a PhD degree in mathematics from Bielefeld University in 2013, candidate of science in mathematics and physics from MSU in 2013 and doctor of computer science degree in 2022. His research interests are in the areas of high-dimensional probability and mathematics of data science. They span a broad range of problems extending from fundamental, theoretical questions in probability to applications in various branches of data science. Alexey Naumov is an author of more than 30 papers in leading journals and conferences of the field including Probability Theory and Related Fields, Bernoulli, Statistics and Computing, COLT, ICML and NeurIPS.
Read More
Read More