Thompson-Sampling Based Reinforcement Learning for Networked Control of Unknown Linear Systems

Published in 2022 IEEE 61st Conference on Decision and Control (CDC), 2022

In recent years, there has been considerable interest in reinforcement learning for linear quadratic Gaussian (LQG) systems. In this paper, we consider a generalization of such systems where the controller and the plant are connected over an unreliable packet drop channel. Packet drops cause the system dynamics to switch between controlled and uncontrolled modes. This switching phenomena introduces new challenges in designing learning algorithms. We identify a sufficient condition under which the regret of Thompson sampling-based reinforcement learning algorithm with dynamic episodes (TSDE) at horizon T is bounded by \tilde{O}(\sqrt(T)), where the \tilde{O}(\cdot) notation hides logarithmic factors in T. These are the first results to generalize regret bounds of LQG systems to packet-drop networked control models. (pdf)

Recommended citation: B. Sayedana, M. Afshari, P. E. Caines and A. Mahajan, "Thompson-Sampling Based Reinforcement Learning for Networked Control of Unknown Linear Systems," 2022 IEEE 61st Conference on Decision and Control (CDC), Cancun, Mexico, 2022, pp. 723-730, doi: 10.1109/CDC51059.2022.9992565. https://ieeexplore.ieee.org/document/9993169