Advertisement

dm_control: Software and tasks for continuous control

      Abstract

      The dm_control software package is a collection of Python libraries and task suites for reinforcement learning agents in an articulated-body simulation. Infrastructure includes a wrapper for the MuJoCo physics engine and libraries for procedural model manipulation and task authoring. Task suites include the Control Suite, a set of standardized tasks intended to serve as performance benchmarks, a locomotion framework and task families, and a set of manipulation tasks with a robot arm and snap-together bricks. An adjunct tech report and interactive tutorial are also provided.

      Keywords

      Code metadata
      Tabled 1
      Current code versionRolling release
      Permanent link to code/repository used for this code versionhttps://github.com/SoftwareImpacts/SIMPAC-2020-26
      Permanent link to Reproducible Capsule
      Legal Code LicenseApache License, Version 2.0
      Code versioning system usedgit
      Software code languages, tools, and services usedPython
      Compilation requirements, operating environments & dependenciesPython 3, MuJoCo
      If available Link to developer documentation/manualarxiv.org/abs/2006.12983
      Support email for questions[email protected]

      1. Introduction

      Reinforcement Learning (RL) casts sequential decision problems as interactions between an agent, which receives observations and outputs actions, and an environment, which receives actions and outputs both observations and a reward. The RL agent attempts to choose actions which will maximize future rewards.
      Recent years have seen rapid progress in the application of RL to difficult problem domains like Atari [
      • Mnih V.
      • Kavukcuoglu K.
      • Silver D.
      • Rusu A.A.
      • Veness J.
      • Bellemare M.G.
      • Graves A.
      • Riedmiller M.
      • Fidjeland A.K.
      • Ostrovski G.
      • Petersen S.
      • Beattie C.
      • Sadik A.
      • Antonoglou I.
      • King H.
      • Kumaran D.
      • Wierstra D.
      • Legg S.
      • Hassabis D.
      Human-level control through deep reinforcement learning.
      ] and Go [
      • Silver D.
      • Huang A.
      • Maddison C.J.
      • Guez A.
      • Sifre L.
      • van den Driessche G.
      • Schrittwieser J.
      • Antonoglou I.
      • Panneershelvam V.
      • Lanctot M.
      • Dieleman S.
      • Grewe D.
      • Nham J.
      • Kalchbrenner N.
      • Sutskever I.
      • Lillicrap T.
      • Leach M.
      • Kavukcuoglu K.
      • Graepel T.
      • Hassabis D.
      Mastering the game of go with deep neural networks and tree search.
      ]. These successes were driven both by the power of novel neural-network-based agent architectures, and by the availability of standardized problem domains. For example, the Arcade Learning Environment [
      • Bellemare M.G.
      • Naddaf Y.
      • Veness J.
      • Bowling M.
      The arcade learning environment: An evaluation platform for general agents.
      ] was and continues to be a vital facilitator of research. Similarly, control and robotics also require well-designed task suites as a standardized playing field, where different, novel approaches can be evaluated and compared. Enabling such progress is the goal of dm_control.
      Besides their importance as a fundamental challenge faced by all animals, physical tasks are different from many of the more abstract domains commonly employed for RL research by being continuous in time, state and action. Physical dynamics are subject to second-order equations of motion: the underlying state is composed of generalized positions and velocities, while actions take the form of forces (which vary like acceleration). Sensory signals (observations) carry meaningful physical units and vary over corresponding ranges and timescales. These properties make control problems a unique subset of general Markov Decision Processes. The most familiar physical control tasks have a fixed subset of degrees of freedom that are directly actuated (the body), while the rest are unactuated. Such embodied tasks are the focus of dm_control.
      Figure thumbnail gr1
      Fig. 1The Control Suite benchmarking domains. Top: acrobot, ball-in-cup, cart-pole, cheetah, finger, fish, hopper. Bottom: humanoid, manipulator, pendulum, point-mass, reacher, swimmer, walker. See the video overview.
      Figure thumbnail gr2
      Fig. 2Procedural domains built with the PyMJCF and Composer task-authoring libraries. Left: Multi-legged creatures from the interactive tutorial. Middle: The “run through corridor” example task. Right: The “stack 3 bricks” manipulation task.
      Figure thumbnail gr3
      Fig. 3Quadrupedal domains. Top: An abstract quadruped (video). Middle: A dog (video). Bottom: A rodent, figure reproduced from , related videos of rodent tasks therein: ‘‘forage’’, ‘‘gaps’’, ‘‘escape’’ and ‘‘two-tap’’.
      Figure thumbnail gr4
      Fig. 4Rendered scenes of multi-agent soccer Left: 2-vs.-2 with walkers. Right: 3-vs.-3 with .
      Figure thumbnail gr5
      Fig. 5Diagram showing the life-cycle of Composer environments.

      2. Tech report and tutorial

      For a thorough description and documentation please refer to our accompanying tech report. We also provide an interactive tutorial as a Google Colaboratory (Jupyter) notebook.

      3. Software infrastructure

      The dm_control package was designed by DeepMind scientists and engineers to facilitate their own continuous control and robotics research needs. It is written in Python, exploiting the agile workflow of a dynamic language, while relying on the C-based MuJoCo physics library, a fast and accurate simulator [
      • Erez T.
      • Tassa Y.
      • Todorov E.
      Simulation tools for model-based robotics: Comparison of Bullet, Havok, MuJoCo, ODE and PhysX.
      ], itself designed to facilitate research [
      • Todorov E.
      • Erez T.
      • Tassa Y.
      MuJoCo: A physics engine for model-based control.
      ]. It is composed of the following modules:
      • The ctypes-based MuJoCo wrapper that provides full access to the simulator, conveniently exposing quantities with named indexing. A Python-based interactive visualizer allows the user to examine and perturb scene elements with a mouse.
      • The PyMJCF library that can procedurally assemble model elements and allows the user to configure or randomize parameters and initial states. See Fig. 2 for examples of environments created with PyMJCF.
      • An RL environment API, based on DeepMind’s dm_env interface, that exposes actions, observations, rewards and terminations in a consistent yet flexible manner.
      • The high-level task-definition framework Composer that combines the above functionality. Amongst other things, it providesa model-variation module for robustification, and an “observable” module for filtered, delayed, corrupted, and stacked sensor data. See Fig. 5 for a lifecycle diagram of a Composer environment.

      4. Tasks

      dm_control includes three sets of control tasks.

      4.1 Control Suite

      The DeepMind Control Suite, first introduced in [
      • Tassa Y.
      • Doron Y.
      • Muldal A.
      • Erez T.
      • Li Y.
      • de Las Casas D.
      • Budden D.
      • Abdolmaleki A.
      • Merel J.
      • Lefrancq A.
      • Lillicrap T.
      • Riedmiller M.
      DeepMind Control Suite Tech. rep..
      ], built directly with the MuJoCo wrapper, provides a set of standard benchmarks for continuous control problems. The unified reward structure offers interpretable learning curves and aggregated suite-wide performance measures. Furthermore, we emphasize high-quality, well-documented code using uniform design patterns, offering a readable, transparent and easily extensible codebase. Fig. 1 shows the “benchmarking” set of domains.

      4.2 Locomotion

      The Locomotion framework was inspired by our work in [
      • Heess N.
      • TB D.
      • Sriram S.
      • Lemmon J.
      • Merel J.
      • Wayne G.
      • Tassa Y.
      • Erez T.
      • Wang Z.
      • Eslami S.M.A.
      • Riedmiller M.
      • Silver D.
      Emergence of locomotion behaviours in rich environments.
      ]. It is designed to facilitate the implementation of a wide range of locomotion tasks for RL algorithms by introducing self-contained, reusable components which compose into different task variants. The Locomotion framework has enabled a number of research efforts including [
      • Merel J.
      • Tassa Y.
      • Dhruva T.
      • Srinivasan S.
      • Lemmon J.
      • Wang Z.
      • Wayne G.
      • Heess N.
      Learning human behaviors from motion capture by adversarial imitation.
      ,
      • Merel J.
      • Hasenclever L.
      • Galashov A.
      • Ahuja A.
      • Pham V.
      • Wayne G.
      • Teh Y.W.
      • Heess N.
      Neural probabilistic motor primitives for humanoid control.
      ,
      • Merel J.
      • Ahuja A.
      • Pham V.
      • Tunyasuvunakool S.
      • Liu S.
      • Tirumala D.
      • Heess N.
      • Wayne G.
      Hierarchical visuomotor control of humanoids.
      ] and more recently has been employed to support Multi-Agent domains in [
      • Liu S.
      • Lever G.
      • Merel J.
      • Tunyasuvunakool S.
      • Heess N.
      • Graepel T.
      Emergent coordination through competition.
      ,
      • Sunehag P.
      • Lever G.
      • Liu S.
      • Merel J.
      • Heess N.
      • Leibo J.Z.
      • Hughes E.
      • Eccles T.
      • Graepel T.
      Reinforcement learning agents acquire flocking and symbiotic behaviour in simulated ecosystems.
      ] and [
      • Banarse D.
      • Bachrach Y.
      • Liu S.
      • Lever G.
      • Heess N.
      • Fernando C.
      • Kohli P.
      • Graepel T.
      The body is not a given: Joint agent policy learning and morphology evolution.
      ]. Fig. 4 shows the “multi-agent soccer” environment described in [
      • Liu S.
      • Lever G.
      • Merel J.
      • Tunyasuvunakool S.
      • Heess N.
      • Graepel T.
      Emergent coordination through competition.
      ]. Fig. 3 shows several quadrupedal domains.

      4.3 Manipulation

      We also provide examples of constructing robotic manipulation tasks. These tasks involve grabbing and manipulating objects with a 3D robotic arm. The set of tasks includes examples of reaching, placing, stacking, throwing, assembly and disassembly. The tasks are designed to be solved using a simulated 6 degree-of-freedom robotic arm based on the Kinova Jaco [
      • Campeau-Lecours A.
      • Lamontagne H.
      • Latour S.
      • Fauteux P.
      • Maheu V.
      • Boucher F.
      • Deguire C.
      • L’Ecuyer L.-J.C.
      Kinova modular robot arms for service robotics applications.
      ], though their modular design permit the use of other arms with minimal changes. These tasks make use of reusable components such as bricks that snap together, and provide examples of reward functions for manipulation. Tasks can be run using vision, low-level features, or combinations of both. See Fig. 6.
      Figure thumbnail gr6
      Fig. 6Left to right: Initial configuration for the task. Initial configuration for the task, note the three translucent bricks, representing the goal configuration. The corresponding 84 × 84 pixel visual observation returned by . Most of the included manipulation tasks make use of snap-together bricks, modelled on Lego Duplo®.

      5. Impact

      5.1 Related software

      The OpenAI Gym [
      • Brockman G.
      • Cheung V.
      • Pettersson L.
      • Schneider J.
      • Schulman J.
      • Tang J.
      • Zaremba W.
      OpenAI gym.
      ] includes a set of domains that have become a popular benchmark in continuous RL [
      • Duan Y.
      • Chen X.
      • Houthooft R.
      • Schulman J.
      • Abbeel P.
      Benchmarking deep reinforcement learning for continuous control.
      ,
      • Henderson P.
      • Islam R.
      • Bachman P.
      • Pineau J.
      • Precup D.
      • Meger D.
      Deep reinforcement learning that matters.
      ]. More recent task suites such as Meta-world [
      • Yu T.
      • Quillen D.
      • He Z.
      • Julian R.
      • Hausman K.
      • Finn C.
      • Levine S.
      Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning.
      ], SURREAL [
      • Fan L.
      • Zhu Y.
      • Zhu J.
      • Liu Z.
      • Zeng O.
      • Gupta A.
      • Creus-Costa J.
      • Savarese S.
      • Fei-Fei L.
      SURREAL: Open-source reinforcement learning framework and robot manipulation benchmark.
      ], RLbench [
      • James S.
      • Ma Z.
      • Rovick Arrojo D.
      • Davison A.J.
      RLBench: The robot learning benchmark & learning environment.
      ] and IKEA [
      • Lee Y.
      • Hu E.S.
      • Yang Z.
      • Yin A.
      • Lim J.J.
      IKEA furniture assembly environment for long-horizon complex manipulation tasks.
      ], have been published, to satisfy the demand for tasks suites that facilitate the study of multi-scale control, multi-task transfer, and meta-learning.

      5.2 Contribution

      In the last three years, dm_control has been used extensively in DeepMind, serving as a fundamental component of continuous control research. For example, see this video for a montage of clips from the following selected publications: [
      • Merel J.
      • Aldarondo D.
      • Marshall J.
      • Tassa Y.
      • Wayne G.
      • Ölveczky B.
      Deep neuroethology of a virtual rodent.
      ,
      • Tassa Y.
      • Doron Y.
      • Muldal A.
      • Erez T.
      • Li Y.
      • de Las Casas D.
      • Budden D.
      • Abdolmaleki A.
      • Merel J.
      • Lefrancq A.
      • Lillicrap T.
      • Riedmiller M.
      DeepMind Control Suite Tech. rep..
      ,
      • Heess N.
      • TB D.
      • Sriram S.
      • Lemmon J.
      • Merel J.
      • Wayne G.
      • Tassa Y.
      • Erez T.
      • Wang Z.
      • Eslami S.M.A.
      • Riedmiller M.
      • Silver D.
      Emergence of locomotion behaviours in rich environments.
      ,
      • Merel J.
      • Hasenclever L.
      • Galashov A.
      • Ahuja A.
      • Pham V.
      • Wayne G.
      • Teh Y.W.
      • Heess N.
      Neural probabilistic motor primitives for humanoid control.
      ,
      • Liu S.
      • Lever G.
      • Merel J.
      • Tunyasuvunakool S.
      • Heess N.
      • Graepel T.
      Emergent coordination through competition.
      ,
      • Zhu Y.
      • Wang Z.
      • Merel J.
      • Rusu A.
      • Erez T.
      • Cabi S.
      • Tunyasuvunakool S.
      • Kramár J.
      • Hadsell R.
      • de Freitas N.
      • Heess N.
      Reinforcement and imitation learning for diverse visuomotor skills.
      ,
      • Riedmiller M.
      • Hafner R.
      • Lampe T.
      • Neunert M.
      • Degrave J.
      • van de Wiele T.
      • Mnih V.
      • Heess N.
      • Springenberg J.T.
      Learning by playing solving sparse reward tasks from scratch.
      ,
      • Amos B.
      • Dinh L.
      • Cabi S.
      • Rothörl T.
      • Muldal A.
      • Erez T.
      • Tassa Y.
      • de Freitas N.
      • Denil M.
      Learning awareness models.
      ,
      • Paine T.L.
      • Colmenarejo S.G.
      • Wang Z.
      • Reed S.
      • Aytar Y.
      • Pfaff T.
      • Hoffman M.W.
      • Barth-Maron G.
      • Cabi S.
      • Budden D.
      • de Freitas N.
      One-shot high-fidelity imitation: Training large-scale deep nets with RL.
      ,
      • Bohez S.
      • Abdolmaleki A.
      • Neunert M.
      • Buchli J.
      • Heess N.
      • Hadsell R.
      Success at any cost: value constrained model-free continuous control.
      ,
      • Schwab D.
      • Springenberg J.T.
      • Martins M.F.
      • Neunert M.
      • Lampe T.
      • Abdolmaleki A.
      • Hertweck T.
      • Hafner R.
      • Nori F.
      • Riedmiller M.A.
      Simultaneously learning vision and feature-based control policies for real-world ball-in-a-cup.
      ,
      • Abdolmaleki A.
      • Huang S.H.
      • Hasenclever L.
      • Neunert M.
      • Song H.F.
      • Zambelli M.
      • Martins M.F.
      • Heess N.
      • Hadsell R.
      • Riedmiller M.
      A distributional view on multi-objective policy optimization.
      ,
      • Merel J.
      • Tunyasuvunakool S.
      • Ahuja A.
      • Tassa Y.
      • Hasenclever L.
      • Pham V.
      • Erez T.
      • Wayne G.
      • Heess N.
      Catch & Carry: Reusable neural controllers for vision-guided whole-body tasks.
      ]. Note that this list is non-exhaustive. Although this software has been available and regularly updated in an open-sourced manner since 2018, until now it has mostly been used internally due to the lack of comprehensive documentation and little public exposure. We hope that this release will serve the public like it served us: as a solid foundation for research.

      Declaration of Competing Interest

      The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

      Appendix A. Supplementary data

      The following is the Supplementary material related to this article.

      References

        • Mnih V.
        • Kavukcuoglu K.
        • Silver D.
        • Rusu A.A.
        • Veness J.
        • Bellemare M.G.
        • Graves A.
        • Riedmiller M.
        • Fidjeland A.K.
        • Ostrovski G.
        • Petersen S.
        • Beattie C.
        • Sadik A.
        • Antonoglou I.
        • King H.
        • Kumaran D.
        • Wierstra D.
        • Legg S.
        • Hassabis D.
        Human-level control through deep reinforcement learning.
        Nature. 2015; 518: 529-533https://doi.org/10.1038/nature14236
        • Silver D.
        • Huang A.
        • Maddison C.J.
        • Guez A.
        • Sifre L.
        • van den Driessche G.
        • Schrittwieser J.
        • Antonoglou I.
        • Panneershelvam V.
        • Lanctot M.
        • Dieleman S.
        • Grewe D.
        • Nham J.
        • Kalchbrenner N.
        • Sutskever I.
        • Lillicrap T.
        • Leach M.
        • Kavukcuoglu K.
        • Graepel T.
        • Hassabis D.
        Mastering the game of go with deep neural networks and tree search.
        Nature. 2016; 529: 484-503https://doi.org/10.1038/nature16961
        • Bellemare M.G.
        • Naddaf Y.
        • Veness J.
        • Bowling M.
        The arcade learning environment: An evaluation platform for general agents.
        J. Artificial Intelligence Res. 2012; https://doi.org/10.1613/jair.3912
        • Merel J.
        • Aldarondo D.
        • Marshall J.
        • Tassa Y.
        • Wayne G.
        • Ölveczky B.
        Deep neuroethology of a virtual rodent.
        International Conference on Learning Representations. 2020 (URL https://openreview.net/forum?id=SyxrxR4KPS)
        • Erez T.
        • Tassa Y.
        • Todorov E.
        Simulation tools for model-based robotics: Comparison of Bullet, Havok, MuJoCo, ODE and PhysX.
        IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2015: 4397-4404https://doi.org/10.1109/ICRA.2015.7139807
        • Todorov E.
        • Erez T.
        • Tassa Y.
        MuJoCo: A physics engine for model-based control.
        IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2012: 5026-5033https://doi.org/10.1109/IROS.2012.6386109
        • Tassa Y.
        • Doron Y.
        • Muldal A.
        • Erez T.
        • Li Y.
        • de Las Casas D.
        • Budden D.
        • Abdolmaleki A.
        • Merel J.
        • Lefrancq A.
        • Lillicrap T.
        • Riedmiller M.
        DeepMind Control Suite.
        DeepMind, 2018 (URL https://arxiv.org/abs/1801.00690)
        • Heess N.
        • TB D.
        • Sriram S.
        • Lemmon J.
        • Merel J.
        • Wayne G.
        • Tassa Y.
        • Erez T.
        • Wang Z.
        • Eslami S.M.A.
        • Riedmiller M.
        • Silver D.
        Emergence of locomotion behaviours in rich environments.
        2017 (arXiv:1707.02286)
        • Merel J.
        • Tassa Y.
        • Dhruva T.
        • Srinivasan S.
        • Lemmon J.
        • Wang Z.
        • Wayne G.
        • Heess N.
        Learning human behaviors from motion capture by adversarial imitation.
        2017 (arXiv preprint arXiv:1707.02201)
        • Merel J.
        • Hasenclever L.
        • Galashov A.
        • Ahuja A.
        • Pham V.
        • Wayne G.
        • Teh Y.W.
        • Heess N.
        Neural probabilistic motor primitives for humanoid control.
        International Conference on Learning Representations. 2019 (URL https://openreview.net/forum?id=BJl6TjRcY7)
        • Merel J.
        • Ahuja A.
        • Pham V.
        • Tunyasuvunakool S.
        • Liu S.
        • Tirumala D.
        • Heess N.
        • Wayne G.
        Hierarchical visuomotor control of humanoids.
        International Conference on Learning Representations. 2019 (URL https://openreview.net/forum?id=BJfYvo09Y7)
        • Liu S.
        • Lever G.
        • Merel J.
        • Tunyasuvunakool S.
        • Heess N.
        • Graepel T.
        Emergent coordination through competition.
        International Conference on Learning Representations. 2019 (URL https://openreview.net/forum?id=BkG8sjR5Km)
        • Sunehag P.
        • Lever G.
        • Liu S.
        • Merel J.
        • Heess N.
        • Leibo J.Z.
        • Hughes E.
        • Eccles T.
        • Graepel T.
        Reinforcement learning agents acquire flocking and symbiotic behaviour in simulated ecosystems.
        Proceedings of the Artificial Life Conference. 2019: 103-110https://doi.org/10.1162/isal_a_00148 (31)
        • Banarse D.
        • Bachrach Y.
        • Liu S.
        • Lever G.
        • Heess N.
        • Fernando C.
        • Kohli P.
        • Graepel T.
        The body is not a given: Joint agent policy learning and morphology evolution.
        Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 2019: 1134-1142https://doi.org/10.5555/3306127.3331813
        • Campeau-Lecours A.
        • Lamontagne H.
        • Latour S.
        • Fauteux P.
        • Maheu V.
        • Boucher F.
        • Deguire C.
        • L’Ecuyer L.-J.C.
        Kinova modular robot arms for service robotics applications.
        Int. J. Robot. Appl. Technol. 2017; 5: 49-71https://doi.org/10.4018/IJRAT.2017070104
        • Brockman G.
        • Cheung V.
        • Pettersson L.
        • Schneider J.
        • Schulman J.
        • Tang J.
        • Zaremba W.
        OpenAI gym.
        2016 (arXiv:arXiv:1606.01540)
        • Duan Y.
        • Chen X.
        • Houthooft R.
        • Schulman J.
        • Abbeel P.
        Benchmarking deep reinforcement learning for continuous control.
        Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48. 2016: 1329-1338https://doi.org/10.5555/3045390.3045531
        • Henderson P.
        • Islam R.
        • Bachman P.
        • Pineau J.
        • Precup D.
        • Meger D.
        Deep reinforcement learning that matters.
        AAAI Conference on Artificial Intelligence. 2017 (URL https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16669)
        • Yu T.
        • Quillen D.
        • He Z.
        • Julian R.
        • Hausman K.
        • Finn C.
        • Levine S.
        Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning.
        Proceedings of the Conference on Robot Learning. 2019 (URL http://proceedings.mlr.press/v100/yu20a.html)
        • Fan L.
        • Zhu Y.
        • Zhu J.
        • Liu Z.
        • Zeng O.
        • Gupta A.
        • Creus-Costa J.
        • Savarese S.
        • Fei-Fei L.
        SURREAL: Open-source reinforcement learning framework and robot manipulation benchmark.
        Proceedings of the Conference on Robot Learning. 2018 (URL http://proceedings.mlr.press/v87/fan18a.html)
        • James S.
        • Ma Z.
        • Rovick Arrojo D.
        • Davison A.J.
        RLBench: The robot learning benchmark & learning environment.
        IEEE Robotics and Automation Letters. 2019; https://doi.org/10.1109/LRA.2020.2974707
        • Lee Y.
        • Hu E.S.
        • Yang Z.
        • Yin A.
        • Lim J.J.
        IKEA furniture assembly environment for long-horizon complex manipulation tasks.
        2019 (arXiv preprint arXiv:1911.07246 . URL https://clvrai.com/furniture)
        • Zhu Y.
        • Wang Z.
        • Merel J.
        • Rusu A.
        • Erez T.
        • Cabi S.
        • Tunyasuvunakool S.
        • Kramár J.
        • Hadsell R.
        • de Freitas N.
        • Heess N.
        Reinforcement and imitation learning for diverse visuomotor skills.
        International Conference on Learning Representations. 2018 (URL https://openreview.net/forum?id=HJWGdbbCW)
        • Riedmiller M.
        • Hafner R.
        • Lampe T.
        • Neunert M.
        • Degrave J.
        • van de Wiele T.
        • Mnih V.
        • Heess N.
        • Springenberg J.T.
        Learning by playing solving sparse reward tasks from scratch.
        Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research. PMLR, Stockholmsmässan, Stockholm Sweden2018: 4344-4353 (URL http://proceedings.mlr.press/v80/riedmiller18a.html)
        • Amos B.
        • Dinh L.
        • Cabi S.
        • Rothörl T.
        • Muldal A.
        • Erez T.
        • Tassa Y.
        • de Freitas N.
        • Denil M.
        Learning awareness models.
        International Conference on Learning Representations. 2018 (URL https://openreview.net/forum?id=r1HhRfWRZ)
        • Paine T.L.
        • Colmenarejo S.G.
        • Wang Z.
        • Reed S.
        • Aytar Y.
        • Pfaff T.
        • Hoffman M.W.
        • Barth-Maron G.
        • Cabi S.
        • Budden D.
        • de Freitas N.
        One-shot high-fidelity imitation: Training large-scale deep nets with RL.
        International Conference on Learning Representations. 2019 (URL https://openreview.net/forum?id=HJMjW3RqtX)
        • Bohez S.
        • Abdolmaleki A.
        • Neunert M.
        • Buchli J.
        • Heess N.
        • Hadsell R.
        Success at any cost: value constrained model-free continuous control.
        International Conference on Learning Representations. 2019 (URL https://openreview.net/forum?id=rJlJ-2CqtX)
        • Schwab D.
        • Springenberg J.T.
        • Martins M.F.
        • Neunert M.
        • Lampe T.
        • Abdolmaleki A.
        • Hertweck T.
        • Hafner R.
        • Nori F.
        • Riedmiller M.A.
        Simultaneously learning vision and feature-based control policies for real-world ball-in-a-cup.
        Robotics: Science and Systems XV, University of Freiburg, Freiburg Im Breisgau, Germany, June 22-26, 2019. 2019https://doi.org/10.15607/RSS.2019.XV.027
        • Abdolmaleki A.
        • Huang S.H.
        • Hasenclever L.
        • Neunert M.
        • Song H.F.
        • Zambelli M.
        • Martins M.F.
        • Heess N.
        • Hadsell R.
        • Riedmiller M.
        A distributional view on multi-objective policy optimization.
        Proceedings of the International Conference on Machine Learning. 2020 (URL https://proceedings.icml.cc/book/2020/hash/02ed812220b0705fabb868ddbf17ea20)
        • Merel J.
        • Tunyasuvunakool S.
        • Ahuja A.
        • Tassa Y.
        • Hasenclever L.
        • Pham V.
        • Erez T.
        • Wayne G.
        • Heess N.
        Catch & Carry: Reusable neural controllers for vision-guided whole-body tasks.
        ACM Trans. Graph. 2020; 39https://doi.org/10.1145/3386569.3392474