Estimating Q(s,s') with Deep Deterministic Dynamics Gradients

  • Edwards, Ashley D.*; Sahni, Himanshu; Liu, Rosanne; Hung, Jane; Jain, Ankit; Wang, Rui; Ecoffet, Adrien; Miconi, Thomas; Isbell, Charles; Yosinski, Jason
  • Accepted abstract
  • [PDF] [Slides] [Join poster session]
    Poster session from 15:00 to 16:00 EAT and from 20:45 to 21:45 EAT
    Obtain the zoom password from ICLR

Abstract

In this paper, we introduce a novel form of a value function, $Q(s, s')$, that expresses the utility of transitioning from a state $s$ to a neighboring state $s'$ and then acting optimally thereafter. In order to derive an optimal policy, we develop a novel forward dynamics model that learns to make next-state predictions that maximize $Q(s,s')$. This formulation decouples actions from values while still learning off-policy. We highlight the benefits of this approach in terms of value function transfer, learning within redundant action spaces, and learning off-policy from state observations generated by sub-optimal or completely random policies.

If videos are not appearing, disable ad-block!