If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
The dm_control software package is a collection of Python libraries and task suites for reinforcement learning agents in an articulated-body simulation. Infrastructure includes a wrapper for the MuJoCo physics engine and libraries for procedural model manipulation and task authoring. Task suites include the Control Suite, a set of standardized tasks intended to serve as performance benchmarks, a locomotion framework and task families, and a set of manipulation tasks with a robot arm and snap-together bricks. An adjunct tech report and interactive tutorial are also provided.
Reinforcement Learning (RL) casts sequential decision problems as interactions between an agent, which receives observations and outputs actions, and an environment, which receives actions and outputs both observations and a reward. The RL agent attempts to choose actions which will maximize future rewards.
Recent years have seen rapid progress in the application of RL to difficult problem domains like Atari [
]. These successes were driven both by the power of novel neural-network-based agent architectures, and by the availability of standardized problem domains. For example, the Arcade Learning Environment [
] was and continues to be a vital facilitator of research. Similarly, control and robotics also require well-designed task suites as a standardized playing field, where different, novel approaches can be evaluated and compared. Enabling such progress is the goal of dm_control.
Besides their importance as a fundamental challenge faced by all animals, physical tasks are different from many of the more abstract domains commonly employed for RL research by being continuous in time, state and action. Physical dynamics are subject to second-order equations of motion: the underlying state is composed of generalized positions and velocities, while actions take the form of forces (which vary like acceleration). Sensory signals (observations) carry meaningful physical units and vary over corresponding ranges and timescales. These properties make control problems a unique subset of general Markov Decision Processes. The most familiar physical control tasks have a fixed subset of degrees of freedom that are directly actuated (the body), while the rest are unactuated. Such embodied tasks are the focus of dm_control.
Fig. 1The Control Suite benchmarking domains. Top: acrobot, ball-in-cup, cart-pole, cheetah, finger, fish, hopper. Bottom: humanoid, manipulator, pendulum, point-mass, reacher, swimmer, walker. See the video overview.
Fig. 2Procedural domains built with the PyMJCF and Composer task-authoring libraries. Left: Multi-legged creatures from the interactive tutorial. Middle: The “run through corridor” example task. Right: The “stack 3 bricks” manipulation task.
Fig. 3Quadrupedal domains. Top: An abstract quadruped (video). Middle: A dog (video). Bottom: A rodent, figure reproduced from [4], related videos of rodent tasks therein: ‘‘forage’’, ‘‘gaps’’, ‘‘escape’’ and ‘‘two-tap’’.
For a thorough description and documentation please refer to our accompanying tech report. We also provide an interactive tutorial as a Google Colaboratory (Jupyter) notebook.
3. Software infrastructure
The dm_control package was designed by DeepMind scientists and engineers to facilitate their own continuous control and robotics research needs. It is written in Python, exploiting the agile workflow of a dynamic language, while relying on the C-based MuJoCo physics library, a fast and accurate simulator [
The ctypes-based MuJoCo wrapper that provides full access to the simulator, conveniently exposing quantities with named indexing. A Python-based interactive visualizer allows the user to examine and perturb scene elements with a mouse.
•
The PyMJCF library that can procedurally assemble model elements and allows the user to configure or randomize parameters and initial states. See Fig. 2 for examples of environments created with PyMJCF.
•
An RL environment API, based on DeepMind’s dm_env interface, that exposes actions, observations, rewards and terminations in a consistent yet flexible manner.
•
The high-level task-definition framework Composer that combines the above functionality. Amongst other things, it providesa model-variation module for robustification, and an “observable” module for filtered, delayed, corrupted, and stacked sensor data. See Fig. 5 for a lifecycle diagram of a Composer environment.
], built directly with the MuJoCo wrapper, provides a set of standard benchmarks for continuous control problems. The unified reward structure offers interpretable learning curves and aggregated suite-wide performance measures. Furthermore, we emphasize high-quality, well-documented code using uniform design patterns, offering a readable, transparent and easily extensible codebase. Fig. 1 shows the “benchmarking” set of domains.
4.2 Locomotion
The Locomotion framework was inspired by our work in [
]. It is designed to facilitate the implementation of a wide range of locomotion tasks for RL algorithms by introducing self-contained, reusable components which compose into different task variants. The Locomotion framework has enabled a number of research efforts including [
The body is not a given: Joint agent policy learning and morphology evolution.
Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems,
2019: 1134-1142https://doi.org/10.5555/3306127.3331813
We also provide examples of constructing robotic manipulation tasks. These tasks involve grabbing and manipulating objects with a 3D robotic arm. The set of tasks includes examples of reaching, placing, stacking, throwing, assembly and disassembly. The tasks are designed to be solved using a simulated 6 degree-of-freedom robotic arm based on the Kinova Jaco [
], though their modular design permit the use of other arms with minimal changes. These tasks make use of reusable components such as bricks that snap together, and provide examples of reward functions for manipulation. Tasks can be run using vision, low-level features, or combinations of both. See Fig. 6.
Fig. 6Left to right: Initial configuration for the
task. Initial configuration for the
task, note the three translucent bricks, representing the goal configuration. The corresponding 84 × 84 pixel visual observation returned by
. Most of the included manipulation tasks make use of snap-together bricks, modelled on Lego Duplo®.
Benchmarking deep reinforcement learning for continuous control.
Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48. 2016: 1329-1338https://doi.org/10.5555/3045390.3045531
], have been published, to satisfy the demand for tasks suites that facilitate the study of multi-scale control, multi-task transfer, and meta-learning.
5.2 Contribution
In the last three years, dm_control has been used extensively in DeepMind, serving as a fundamental component of continuous control research. For example, see this video for a montage of clips from the following selected publications: [
Learning by playing solving sparse reward tasks from scratch.
Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research. PMLR,
Stockholmsmässan, Stockholm Sweden2018: 4344-4353
]. Note that this list is non-exhaustive. Although this software has been available and regularly updated in an open-sourced manner since 2018, until now it has mostly been used internally due to the lack of comprehensive documentation and little public exposure. We hope that this release will serve the public like it served us: as a solid foundation for research.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Appendix A. Supplementary data
The following is the Supplementary material related to this article.
The body is not a given: Joint agent policy learning and morphology evolution.
Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems,
2019: 1134-1142https://doi.org/10.5555/3306127.3331813
Benchmarking deep reinforcement learning for continuous control.
Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48. 2016: 1329-1338https://doi.org/10.5555/3045390.3045531
Learning by playing solving sparse reward tasks from scratch.
Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research. PMLR,
Stockholmsmässan, Stockholm Sweden2018: 4344-4353 (URL http://proceedings.mlr.press/v80/riedmiller18a.html)
A distributional view on multi-objective policy optimization.
Proceedings of the International Conference on Machine Learning. 2020 (URL https://proceedings.icml.cc/book/2020/hash/02ed812220b0705fabb868ddbf17ea20)