Unsupervised Reinforcement Learning with Contrastive Intrinsic Control
Neural Information Processing Systems (NeurIPS 2022)
Michael Laskin (1) Hao Liu (1) Xue Bin Peng (1) Denis Yarats (2,3) Aravind Rajeswaran (3) Pieter Abbeel (1,4)
(1) University of California, Berkeley (2) New York University (3) MetaAI (4) Covariant.
|
Abstract
We introduce Contrastive Intrinsic Control (CIC), an unsupervised
reinforcement learning (RL) algorithm that maximizes the mutual
information between statetransitions and latent skill vectors.
CIC utilizes contrastive learning between state-transitions and
skills vectors to learn behaviour embeddings and maximizes the
entropy of these embeddings as an intrinsic reward to encourage
behavioural diversity. We evaluate our algorithm on the
Unsupervised RL Benchmark (URLB) in the asymptotic state-based
setting, which consists of a long reward-free pretraining phase
followed by a short adaptation phase to downstream tasks with
extrinsic rewards. We find that CIC improves over prior
exploration algorithms in terms of adaptation efficiency to
downstream tasks on state-based URLB.
|
Paper: [PDF] Webpage: [Link] Code: [GitHub] Preprint: [arXiv]
|
Bibtex
@article{
CICLaskin2022,
author = {Laskin, Michael and Liu, Hao and Peng, Xue Bin and Yarats, Denis and Rajeswaran, Aravind and Abbeel, Pieter},
booktitle = {Advances in Neural Information Processing Systems},
editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh},
pages = {34478--34491},
publisher = {Curran Associates, Inc.},
title = {Unsupervised Reinforcement Learning with Contrastive Intrinsic Control},
url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/debf482a7dbdc401f9052dbe15702837-Paper-Conference.pdf},
volume = {35},
year = {2022}
}