Invariant Transform Experience Replay

Deep Reinforcement Learning (RL) is a promising approach for adaptive robot control, but its current application to robotics is currently hindered by high sample requirements. To alleviate this issue, we propose to exploit the symmetries present in robotic tasks. Intuitively, symmetries from observed trajectories define transformations that leave the space of feasible RL trajectories invariant and can be used to generate new feasible trajectories, which could be used for training. Based on this data augmentation idea, we formulate a general framework, called Invariant Transform Experience Replay that we present with two techniques: (i) Kaleidoscope Experience Replay exploits reflectional symmetries and (ii) Goal-augmented Experience Replay which takes advantage of lax goal definitions. In the Fetch tasks from OpenAI Gym, our experimental results show significant increases in learning rates and success rates. Particularly, we attain a 13, 3, and 5 times speedup in the pushing, sliding, and pick-and-place tasks respectively in the multi-goal setting. Performance gains are also observed in similar tasks with obstacles and we successfully deployed a trained policy on a real Baxter robot. Our work demonstrates that invariant transformations on RL trajectories are a promising methodology to speedup learning in deep RL.

The goal is to reduce the number of interactions with the real environment. We do this through the use of decomposable symmetries (KER) and the use of lax goal definitions in both original and reflected symmetries (GER).

KER uses reflectional symmetry. We use the xoz plane that can be rotated about the z-axis to define an invariant symmetry for a robotic task as shown below.

KER augments the original trajetory with a certain number of random reflectional symmetries. To ensure validity, a maximum angle $theta_{max}$ which is dependent on the workspace is used.

The number of symmetric planes $latexn_{KER}$ also needs to be specified. With the robot shoulder base set at the world coordinates origin (in this way if the arm is offset from the center, you can still do this). When more than one plane of symmetry is used, the last reflection will always be done by the xoz plane. One plane of symmetry will create one reflection, but for the last reflection, the one for the xoz plane, we reflect all available trajectories in the system. In this way, we create $latex2_{n_{KER}}-1$ planes as illustrated below.

Valid trajectory reflections generated from the original motion of the robot.

GER is a type of reward-preserving decomposable symmetry operation. GER augments trajectories by replacing the original goal with a random goal sampled within that ball of radius $epsilon$ . We also use the ” future” strategy introduced in the original HER paper. The figures below illustrated this.

In effect, we can consider this a generalization of HER (see paper for details) and as HER applied on mini-batches. For each application of GER, the size of the minibatch is increased linearly with the new artificial transitions.

We perform experiements as with HER: we use a 7 DoF Fetch arm with DDPG on pushing, sliding, pick-and-place tasks from OpenAI Gym. We consider the success rate as the number of successful episodes in an epoch. In our work, we use a smaller number of episodes in HER since it has superior learning speed: 1 epoch = 100 episodes as opposed to 800.

Results

Our experiments seek to answer 3 sets of questions:

How does ITER (KER+GER) perform compared to HER on single and multi-goal tasks?
How much KER contributes to the performance of ITER? How many $n_{KER}$ should be used?
What is the contribution of GER to the performance of ITER? What is the impact of $n_{GER}$?

Both in the multi-goal and single-goal setting, we observe a dramatic increase in learning speed is seen for the push and pick and place tasks. We do not use any learning tricks in the pick-and-place task as in HER and learn after about 180 epoche. For the slide task, the more minimal impact might be explained by the fact that the result is determined only by a few contacts (generally one) between the gripper and the block.

How many symmetries should we use?

We notice a monotonic increase with respect to the number of random symmetries, though the gain diminishes as the number of reflections increase.

Does GER improve performance?

In this experiment, we do not user KER. We only vary the number of GER applications. Here, as with KER, we observe an increase in performance with increasing GER until a ceiling is reached.

Finally, the video below illustrates the contribution in more detail:

https://youtu.be/Ac3c_xs7pJ8