Offline Meta-Reinforcement Learning with Advantage Weighting

  • Mitchell, Eric A*; Rafailov, Rafael; Peng, Xue Bin; Levine, Sergey; Finn, Chelsea
  • Accepted abstract
  • [PDF] [Slides] [Join poster session]
    Poster session from 15:00 to 16:00 EAT and from 20:45 to 21:45 EAT
    Obtain the zoom password from ICLR


Meta-reinforcement learning algorithms offer the promise of enabling rapid reinforcement learning of new skills by leveraging previous experience. However, while the adaptation phase of such methods is remarkably efficient, the meta-training phase of prior algorithms requires an impractical amount of experience to be collected in the loop of meta-training. To address this issue, we aim to enable batch offline meta-reinforcement learning --- a class of algorithms that can meta-learn using only a static batch of multi-task data, without interacting with the environment, to prepare for fast learning of new, related tasks. To this end, we develop an optimization-based meta-learning algorithm that uses simple supervised regression objectives for both inner-loop adaptation and outer-loop meta-learning, showing respectable performance on some common benchmarks and state-of-the-art performance when adapting to out-of-distribution test tasks.

If videos are not appearing, disable ad-block!