Adaptive Exploration via Modulated Behaviour

  • Schaul, Tom*; Borsa, Diana; Ding, David; Szepesvari, David; Ostrovski, Georg; Dabney, Will; Osindero, Simon
  • Accepted abstract
  • [PDF] [Join poster session]
    Poster session from 15:00 to 16:00 EAT and from 20:45 to 21:45 EAT
    Obtain the zoom password from ICLR

Abstract

There are few ways to exploit, but there are many ways to explore. We propose a reinforcement learning (RL) framework designed to study how to produce, adapt and learn from diverse exploratory behaviours. The goal is to reduce the need for tuning exploration by instead embracing this diversity, and adapting it to the task at hand. The central mechanism is simple: the policy of a single deep RL agent is modulated to produce many variants of its learned behaviour on the fly, and closing the loop, all the experience generated by these variants is used for off-policy training. This is combined with a dynamic adaptation mechanism that attempts to generate more of the behaviours that are useful to the agent, at its current phase of learning. We demonstrate on a suite of Atari 2600 games how this approach produces results comparable to per-task tuning at a fraction of the cost. In addition, we highlight a number of qualitative effects, such as emerging task-dependent schedules, as well as different properties for different types of modulation (e.g. stochasticity, consistency or optimism).

If videos are not appearing, disable ad-block!