A Kernel Perspective on Behavioural Metrics for Markov Decision Processes

Pablo Samuel Castro · Tyler Kastner · Prakash Panangaden · Mark Rowland


Paper PDF

Thumbnail of paper pages


We present a novel perspective on behavioural metrics for Markov decision processes via the use of positive definite kernels. We define a new metric under this lens that is provably equivalent to the recently introduced MICo distance (Castro et al., 2021). The kernel perspective enables us to provide new theoretical results, including value-function bounds and low-distortion finite-dimensional Euclidean embeddings, which are crucial when using behavioural metrics for reinforcement learning representations. We complement our theory with strong empirical results that demonstrate the effectiveness of these methods in practice.