A toolkit for developing and comparing reinforcement learning algorithms.

Overview

Status: Maintenance (expect bug fixes and minor updates)

OpenAI Gym

OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms. This is the gym open-source library, which gives you access to a standardized set of environments.

https://travis-ci.org/openai/gym.svg?branch=master

See What's New section below

gym makes no assumptions about the structure of your agent, and is compatible with any numerical computation library, such as TensorFlow or Theano. You can use it from Python code, and soon from other languages.

If you're not sure where to start, we recommend beginning with the docs on our site. See also the FAQ.

A whitepaper for OpenAI Gym is available at http://arxiv.org/abs/1606.01540, and here's a BibTeX entry that you can use to cite it in a publication:

@misc{1606.01540,
  Author = {Greg Brockman and Vicki Cheung and Ludwig Pettersson and Jonas Schneider and John Schulman and Jie Tang and Wojciech Zaremba},
  Title = {OpenAI Gym},
  Year = {2016},
  Eprint = {arXiv:1606.01540},
}

Basics

There are two basic concepts in reinforcement learning: the environment (namely, the outside world) and the agent (namely, the algorithm you are writing). The agent sends actions to the environment, and the environment replies with observations and rewards (that is, a score).

The core gym interface is Env, which is the unified environment interface. There is no interface for agents; that part is left to you. The following are the Env methods you should know:

  • reset(self): Reset the environment's state. Returns observation.
  • step(self, action): Step the environment by one timestep. Returns observation, reward, done, info.
  • render(self, mode='human'): Render one frame of the environment. The default mode will do something human friendly, such as pop up a window.

Supported systems

We currently support Linux and OS X running Python 3.5 -- 3.8 Windows support is experimental - algorithmic, toy_text, classic_control and atari should work on Windows (see next section for installation instructions); nevertheless, proceed at your own risk.

Installation

You can perform a minimal install of gym with:

git clone https://github.com/openai/gym.git
cd gym
pip install -e .

If you prefer, you can do a minimal install of the packaged version directly from PyPI:

pip install gym

You'll be able to run a few environments right away:

  • algorithmic
  • toy_text
  • classic_control (you'll need pyglet to render though)

We recommend playing with those environments at first, and then later installing the dependencies for the remaining environments.

You can also run gym on gitpod.io to play with the examples online. In the preview window you can click on the mp4 file you want to view. If you want to view another mp4 file, just press the back button and click on another mp4 file.

Installing everything

To install the full set of environments, you'll need to have some system packages installed. We'll build out the list here over time; please let us know what you end up installing on your platform. Also, take a look at the docker files (py.Dockerfile) to see the composition of our CI-tested images.

On Ubuntu 16.04 and 18.04:

apt-get install -y libglu1-mesa-dev libgl1-mesa-dev libosmesa6-dev xvfb ffmpeg curl patchelf libglfw3 libglfw3-dev cmake zlib1g zlib1g-dev swig

MuJoCo has a proprietary dependency we can't set up for you. Follow the instructions in the mujoco-py package for help. Note that we currently do not support MuJoCo 2.0 and above, so you will need to install a version of mujoco-py which is built for a lower version of MuJoCo like MuJoCo 1.5 (example - mujoco-py-1.50.1.0). As an alternative to mujoco-py, consider PyBullet which uses the open source Bullet physics engine and has no license requirement.

Once you're ready to install everything, run pip install -e '.[all]' (or pip install 'gym[all]').

Pip version

To run pip install -e '.[all]', you'll need a semi-recent pip. Please make sure your pip is at least at version 1.5.0. You can upgrade using the following: pip install --ignore-installed pip. Alternatively, you can open setup.py and install the dependencies by hand.

Rendering on a server

If you're trying to render video on a server, you'll need to connect a fake display. The easiest way to do this is by running under xvfb-run (on Ubuntu, install the xvfb package):

xvfb-run -s "-screen 0 1400x900x24" bash

Installing dependencies for specific environments

If you'd like to install the dependencies for only specific environments, see setup.py. We maintain the lists of dependencies on a per-environment group basis.

Environments

See List of Environments and the gym site.

For information on creating your own environments, see Creating your own Environments.

Examples

See the examples directory.

Testing

We are using pytest for tests. You can run them via:

pytest

Resources

What's new

  • 2020-12-18 (v 0.18.0)
    • Add python 3.9 support
    • Remove python 3.5 support (thanks @justinkterry on both!)
    • TimeAwareObservationWrapper (thanks @zuoxingdong!)
    • Space-related fixes and tests (thanks @wmmc88!)
  • 2020-09-29 (v 0.17.3)
    • Allow custom spaces in VectorEnv (thanks @tristandeleu!)
    • CarRacing performance improvements (thanks @leocus!)
    • Dict spaces are now iterable (thanks @NotNANtoN!)
  • 2020-05-08 (v 0.17.2)
    • remove unnecessary precision warning when creating Box with scalar bounds - thanks @johannespitz!
    • remove six from the dependencies
    • FetchEnv sample goal range can be specified through kwargs - thanks @YangRui2015!
  • 2020-03-05 (v 0.17.1)
    • update cloudpickle dependency to be >=1.2.0,<1.4.0
  • 2020-02-21 (v 0.17.0)
    • Drop python 2 support
    • Add python 3.8 build
  • 2020-02-09 (v 0.16.0)
    • EnvSpec API change - remove tags field (retro-active version bump, the changes are actually already in the codebase since 0.15.5 - thanks @wookayin for keeping us in check!)
  • 2020-02-03 (v0.15.6)
    • pyglet 1.4 compatibility (this time for real :))
    • Fixed the bug in BipedalWalker and BipedalWalkerHardcore, bumped version to 3 (thanks @chozabu!)
  • 2020-01-24 (v0.15.5)
    • pyglet 1.4 compatibility
    • remove python-opencv from the requirements
  • 2019-11-08 (v0.15.4)
    • Added multiple env wrappers (thanks @zuoxingdong and @hartikainen!)
    • Removed mujoco >= 2.0 support due to lack of tests
  • 2019-10-09 (v0.15.3)
    • VectorEnv modifications - unified the VectorEnv api (added reset_async, reset_wait, step_async, step_wait methods to SyncVectorEnv); more flexibility in AsyncVectorEnv workers
  • 2019-08-23 (v0.15.2)
    • More Wrappers - AtariPreprocessing, FrameStack, GrayScaleObservation, FilterObservation, FlattenDictObservationsWrapper, PixelObservationWrapper, TransformReward (thanks @zuoxingdong, @hartikainen)
    • Remove rgb_rendering_tracking logic from mujoco environments (default behavior stays the same for the -v3 environments, rgb rendering returns a view from tracking camera)
    • Velocity goal constraint for MountainCar (thanks @abhinavsagar)
    • Taxi-v2 -> Taxi-v3 (add missing wall in the map to replicate env as describe in the original paper, thanks @kobotics)
  • 2019-07-26 (v0.14.0)
    • Wrapper cleanup
    • Spec-related bug fixes
    • VectorEnv fixes
  • 2019-06-21 (v0.13.1)
    • Bug fix for ALE 0.6 difficulty modes
    • Use narrow range for pyglet versions
  • 2019-06-21 (v0.13.0)
    • Upgrade to ALE 0.6 (atari-py 0.2.0) (thanks @JesseFarebro!)
  • 2019-06-21 (v0.12.6)
    • Added vectorized environments (thanks @tristandeleu!). Vectorized environment runs multiple copies of an environment in parallel. To create a vectorized version of an environment, use gym.vector.make(env_id, num_envs, **kwargs), for instance, gym.vector.make('Pong-v4',16).
  • 2019-05-28 (v0.12.5)
    • fixed Fetch-slide environment to be solvable.
  • 2019-05-24 (v0.12.4)
    • remove pyopengl dependency and use more narrow atari-py and box2d-py versions
  • 2019-03-25 (v0.12.1)
    • rgb rendering in MuJoCo locomotion -v3 environments now comes from tracking camera (so that agent does not run away from the field of view). The old behaviour can be restored by passing rgb_rendering_tracking=False kwarg. Also, a potentially breaking change!!! Wrapper class now forwards methods and attributes to wrapped env.
  • 2019-02-26 (v0.12.0)
    • release mujoco environments v3 with support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc
  • 2019-02-06 (v0.11.0)
    • remove gym.spaces.np_random common PRNG; use per-instance PRNG instead.
    • support for kwargs in gym.make
    • lots of bugfixes
  • 2018-02-28: Release of a set of new robotics environments.

  • 2018-01-25: Made some aesthetic improvements and removed unmaintained parts of gym. This may seem like a downgrade in functionality, but it is actually a long-needed cleanup in preparation for some great new things that will be released in the next month.

    • Now your Env and Wrapper subclasses should define step, reset, render, close, seed rather than underscored method names.
    • Removed the board_game, debugging, safety, parameter_tuning environments since they're not being maintained by us at OpenAI. We encourage authors and users to create new repositories for these environments.
    • Changed MultiDiscrete action space to range from [0, ..., n-1] rather than [a, ..., b-1].
    • No more render(close=True), use env-specific methods to close the rendering.
    • Removed scoreboard directory, since site doesn't exist anymore.
    • Moved gym/monitoring to gym/wrappers/monitoring
    • Add dtype to Space.
    • Not using python's built-in module anymore, using gym.logger
  • 2018-01-24: All continuous control environments now use mujoco_py >= 1.50. Versions have been updated accordingly to -v2, e.g. HalfCheetah-v2. Performance should be similar (see https://github.com/openai/gym/pull/834) but there are likely some differences due to changes in MuJoCo.

  • 2017-06-16: Make env.spec into a property to fix a bug that occurs when you try to print out an unregistered Env.

  • 2017-05-13: BACKWARDS INCOMPATIBILITY: The Atari environments are now at v4. To keep using the old v3 environments, keep gym <= 0.8.2 and atari-py <= 0.0.21. Note that the v4 environments will not give identical results to existing v3 results, although differences are minor. The v4 environments incorporate the latest Arcade Learning Environment (ALE), including several ROM fixes, and now handle loading and saving of the emulator state. While seeds still ensure determinism, the effect of any given seed is not preserved across this upgrade because the random number generator in ALE has changed. The *NoFrameSkip-v4 environments should be considered the canonical Atari environments from now on.

  • 2017-03-05: BACKWARDS INCOMPATIBILITY: The configure method has been removed from Env. configure was not used by gym, but was used by some dependent libraries including universe. These libraries will migrate away from the configure method by using wrappers instead. This change is on master and will be released with 0.8.0.

  • 2016-12-27: BACKWARDS INCOMPATIBILITY: The gym monitor is now a wrapper. Rather than starting monitoring as env.monitor.start(directory), envs are now wrapped as follows: env = wrappers.Monitor(env, directory). This change is on master and will be released with 0.7.0.

  • 2016-11-1: Several experimental changes to how a running monitor interacts with environments. The monitor will now raise an error if reset() is called when the env has not returned done=True. The monitor will only record complete episodes where done=True. Finally, the monitor no longer calls seed() on the underlying env, nor does it record or upload seed information.

  • 2016-10-31: We're experimentally expanding the environment ID format to include an optional username.

  • 2016-09-21: Switch the Gym automated logger setup to configure the root logger rather than just the 'gym' logger.

  • 2016-08-17: Calling close on an env will also close the monitor and any rendering windows.

  • 2016-08-17: The monitor will no longer write manifest files in real-time, unless write_upon_reset=True is passed.

  • 2016-05-28: For controlled reproducibility, envs now support seeding (cf #91 and #135). The monitor records which seeds are used. We will soon add seed information to the display on the scoreboard.

Comments
  • Seeding update

    Seeding update

    See Issue #1663

    This is a bit of a ride. The base change is getting rid of the custom seeding utils of gym, and instead using np.random.default_rng() as is recommended with modern versions of NumPy. I kept the gym.utils.seeding.np_random interface and changed it to basically being a synonym for default_rng (with some API difference, consistent with the old np_random)

    Because the API (then RandomState, now Generator) changed a bit, np_random.randint calls were replaced with np_random.integers, np_random.rand -> np_random.random, np_random.randn -> np_random.standard_normal. This is all in accordance with the recommended NumPy conversion.

    My doubts in order of (subjective) importance

    Doubt number 1:

    In gym/utils/seeding.py#L18 I'm accessing a "protected" variable seed_seq. This serves accessing the random seed that was automatically generated under the hood when the passed seed is None. (it also gives the correct value if the passed seed is an integer) An alternative solution would be restoring the whole create_seed machinery which generates a random initial seed from os.urandom. I was unable to find another way to get the initial seed of a default Generator (default_rng(None)) instance.

    Doubt number 2:

    In gym/spaces/multi_discrete.py#L64. Turns out that a Generator doesn't directly support get_state and set_state. The same functionality seems to be achievable by accesing the bit generator and modifying its state directly (without getters/setters).

    Doubt number 3:

    As mentioned earlier, this version maintains the gym.utils.seeding file with just a single function. Functionally, I don't think that's a problem, but might feel a bit redundant from a stylistic point of view. This could be replaced by changing something like 17 calls of this function that occur in the codebase, but at the moment I think it'd be a bad move. The reason is that the function passes through the seed that's generated if the passed seed is None (see Doubt 1), which has to be obtained through mildly sketchy means, so it's better to keep it contained within the function. I don't think passing the seed extremely necessary, but that would somewhat significantly change the actual external API, I tend to be hesitant about changes like this if there's no good reason. Overall I think it's good to keep the seeding.np_random function to keep some consistency with previous versions. The alternative is just completely removing the concept of "gym seeding", and using NumPy. (right now "gym seeding" is basically an interface for NumPy seeding)

    Doubt number 4:

    Pinging @araffin as there's a possibility this will (again) break some old pickled spaces in certain cases, and I know this was an issue with SB3 and the model zoo. Specifically, if you create a Space with the current master branch, run space.sample() at least once, and then pickle it, it will be pickled with a RandomState instance, which is now considered a legacy generator in NumPy. If you unpickle it using new gym code (i.e. this PR), space.np_random will still point to a RandomState, but the rest of the code expects space.np_random to be a Generator instance, which has a few API changes (see the beginning of this post).

    Overall I don't know how important it is for the internal details of gym objects to remain the same - which is more or less necessary for old objects to be unpicklable in new gym versions. There's probably a way of a custom unpickling protocol as a compatibility layer - I'm not sufficiently familiar with this to do it, but I imagine it should be doable on the user side? (i.e. not in gym)

    Doubt number 2137: (very low importance)

    This doesn't technically solve #2210. IMHO this is absolutely acceptable, because np.random.seed is part of the legacy seeding mechanism, and is officially discouraged by NumPy. Proper usage yields expected results:

    import numpy as np
    from gym.utils import seeding
    
    user_given_seed = None
    np_random, seed = seeding.np_random(user_given_seed)
    
    # since in some places np.random.randn or similar is used we want to seed the global numpy random generator as well
    rng = np.random.default_rng(seed)
    

    tl;dr

    This definitely needs another set of eyes on it because I'm not confident enough about the nitty-gritty details of NumPy RNG. There are a few things I'm not 100% happy with from a stylistic point of view, but as far as my understanding goes, the functionality is what it should be. There's also the question of supporting old pickled objects, which I think is a whole different topic that needs discussing now that gym is maintained again.

    opened by RedTachyon 64
  • Box2d won't find some RAND_LIMIT_swigconstant

    Box2d won't find some RAND_LIMIT_swigconstant

    Hello!

    It's probably some silly mistake on my side, but i wasn't able to fix by random lever pulling, as usual.

    Installing Box2d as in instuctions (using pip install -e .[all]) will throw error when trying to use some of Box2D examples.

    Code that reproduces the issue:

    import gym
    atari = gym.make('LunarLander-v0')
    atari.reset()
    
    [2016-05-16 02:14:25,430] Making new env: LunarLander-v0
    
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-1-f89e78f4410b> in <module>()
          1 import gym
    ----> 2 atari = gym.make('LunarLander-v0')
          3 atari.reset()
          4 #plt.imshow(atari.render('rgb_array'))
    
    /home/jheuristic/yozhik/gym/gym/envs/registration.pyc in make(self, id)
         77         logger.info('Making new env: %s', id)
         78         spec = self.spec(id)
    ---> 79         return spec.make()
         80 
         81     def all(self):
    
    /home/jheuristic/yozhik/gym/gym/envs/registration.pyc in make(self)
         52             raise error.Error('Attempting to make deprecated env {}. (HINT: is there a newer registered version of this env?)'.format(self.id))
         53 
    ---> 54         cls = load(self._entry_point)
         55         env = cls(**self._kwargs)
         56 
    
    /home/jheuristic/yozhik/gym/gym/envs/registration.pyc in load(name)
         11 def load(name):
         12     entry_point = pkg_resources.EntryPoint.parse('x={}'.format(name))
    ---> 13     result = entry_point.load(False)
         14     return result
         15 
    
    /home/jheuristic/thenv/local/lib/python2.7/site-packages/pkg_resources/__init__.pyc in load(self, require, *args, **kwargs)
       2378         if require:
       2379             self.require(*args, **kwargs)
    -> 2380         return self.resolve()
       2381 
       2382     def resolve(self):
    
    /home/jheuristic/thenv/local/lib/python2.7/site-packages/pkg_resources/__init__.pyc in resolve(self)
       2384         Resolve the entry point from its module and attrs.
       2385         """
    -> 2386         module = __import__(self.module_name, fromlist=['__name__'], level=0)
       2387         try:
       2388             return functools.reduce(getattr, self.attrs, module)
    
    /home/jheuristic/yozhik/gym/gym/envs/box2d/__init__.py in <module>()
    ----> 1 from gym.envs.box2d.lunar_lander import LunarLander
          2 from gym.envs.box2d.bipedal_walker import BipedalWalker, BipedalWalkerHardcore
    
    /home/jheuristic/yozhik/gym/gym/envs/box2d/lunar_lander.py in <module>()
          3 from six.moves import xrange
          4 
    ----> 5 import Box2D
          6 from Box2D.b2 import (edgeShape, circleShape, fixtureDef, polygonShape, revoluteJointDef, contactListener)
          7 
    
    /home/jheuristic/thenv/local/lib/python2.7/site-packages/Box2D/__init__.py in <module>()
         18 # 3. This notice may not be removed or altered from any source distribution.
         19 #
    ---> 20 from .Box2D import *
         21 __author__ = '$Date$'
         22 __version__ = '2.3.1'
    
    /home/jheuristic/thenv/local/lib/python2.7/site-packages/Box2D/Box2D.py in <module>()
        433     return _Box2D.b2CheckPolygon(shape, additional_checks)
        434 
    --> 435 _Box2D.RAND_LIMIT_swigconstant(_Box2D)
        436 RAND_LIMIT = _Box2D.RAND_LIMIT
        437 
    
    AttributeError: 'module' object has no attribute 'RAND_LIMIT_swigconstant'
    
    

    What didn't help:

    pip uninstall gym
    apt-get install -y python-numpy python-dev cmake zlib1g-dev libjpeg-dev xvfb libav-tools xorg-dev python-opengl
    git clone https://github.com/openai/gym
    cd gym
    pip install -e .[all] --upgrade
    

    The OS is Ubuntu 14.04 Server x64 It may be a clue that i am running the thing from inside python2 virtualenv (with all numpys, etc. installed)

    opened by justheuristic 52
  • New Step API with terminated, truncated bools instead of done

    New Step API with terminated, truncated bools instead of done

    Description

    step method is changed to return five items instead of four.

    Old API - done=True if episode ends in any way.

    New API - terminated=True if environment terminates (eg. due to task completion, failure etc.) truncated=True if episode truncates due to a time limit or a reason that is not defined as part of the task MDP

    Link to docs - https://github.com/Farama-Foundation/gym-docs/pull/115 (To be updated with latest changes)

    Changes

    1. All existing environment implementations are changed to new API without direct support for old API. However gym.make for any environment will default to old API through a compatibility wrapper.
    2. Vector env implementations are changed to new API, with backward compatibility for old API, defaulting to old API. New API can set by a newly added argument new_step_api=True in constructor.
    3. All wrapper implementations are changed to new API, and have backward compatibility and default to old API (can be switched to new API with new_step_api=True).
    4. Some changes in phrasing - terminal_reward, terminal_observation etc. is replaced with final_reward, final_observation etc. The intention is to reserve the 'termination' word for only if terminated=True. (for some motivation, Sutton and Barto uses terminal states to specifically refer to special states whose values are 0, states at the end of the MDP. This is not true for a truncation where the value of the final state need not be 0. So the current usage of terminal_obs etc. would be incorrect if we adopt this definition)
    5. All tests are continued to be performed for old API (since the default is old API for now). A single exception for when the test env is unwrapped and so the compatibility wrapper doesn't apply. Also, special tests are added just for testing new API.
    6. new_step_api argument is used in different places. It's meaning is taken to be "whether this function / class should output step values in new API or not". Eg. self.new_step_api in a wrapper signifies whether the wrapper's step method outputs items in new API (the wrapper itself might have been written in new or old API, but through compatibility code it will output according to self.new_step_api)
    7. play.py alone is retained in old API due to the difficulty in having it be compatible for both APIs simultaneously, and being slightly lower priority.

    StepAPICompatibility Wrapper

    1. This wrapper is added to support conversion from old to new API and vice versa.
    2. Takes new_step_api argument in __init__. False (old API) by default.
    3. Wrapper applied at make with new_step_api=False by default. It can be changed during make like gym.make("CartPole-v1", new_step_api=True). The order of wrappers applied at make is as follows - core env -> PassiveEnvChecker -> StepAPICompatibility -> other wrappers

    step_api_compatibility function

    This function is similar to the wrapper, it is used for backward compatibility in wrappers, vector envs. It is used at interfaces between env / wrapper / vector / outside code. Example usage,

    # wrapper's step method
    def step(self, action):
    
        # here self.env.step is made to return in new API, since the wrapper is written in new API
        obs, rew, terminated, truncated, info = step_api_compatibility(self.env.step(action), new_step_api=True) 
    
        if terminated or truncated:
            print("Episode end")
        ### more wrapper code
    
        # here the wrapper is made to return in API specified by self.new_step_api, that is set to False by default, and can be changed according to the situation
        return step_api_compatibility((obs, rew, terminated, truncated, info), new_step_api=self.new_step_api) 
    

    TimeLimit

    1. In the current implementation of the timelimit wrapper, existence of 'TimeLimit.truncated' key in info means that truncation has occurred. The boolean value it is set to refers to whether the core environment has already ended. So, info['TimeLimit.truncated']=False, means the core environment has already terminated. We can infer terminated=True, truncated=True from this case.
    2. To change old API to new, the compatibility function first checks info. If there is nothing in info, it returns terminated=done and truncated=False as there is no better information available. If TimeLimit info is available, it accordingly sets the two bools.

    Backward Compatibility

    The PR attempts to achieve almost complete backward compatibility. However, there are cases which haven't been included. Environments directly imported eg. from gym.envs.classic_control import CartPoleEnv would not be backward compatible as these are rewritten in new API. StepAPICompatibility wrapper would need to be used manually in this case. Envs made through gym.make all default to old API. Vector and wrappers also default to old API. These should all continue to work without problems. But due to the scale of the change, bugs are expected.

    Warning Details

    Warnings are raised at the following locations:

    1. gym.Wrapper constructor - warning raised if self.new_step_api==False. This means any wrapper that does not explicitly pass new_step_api=True into super() will raise the warning since self.new_step_api=False by default. This is taken care of by wrappers written inside gym. Third party wrappers will face a problem in a specific situation - if the wrapper is not impacted by step API. eg. a wrapper subclassing ActionWrapper. This would work without any change for both APIs, however to avoid the warning, they still need to pass new_step_api=True into super(). The thinking is - "If your wrapper supports new step API, you need to pass new_step_api=True to avoid the warning".
    2. PassiveEnvChecker, passive_env_step_check function - if step return has 4 items a warning is raised. This happens only once since this function is only run once after env initialization. Since PassiveEnvChecker is wrapped first before step compatibility in make, this will raise a warning based on the core env implementation's API.
    3. gym.VectorEnv constructor - warning raised if self.new_step_api==False.
    4. StepAPICompatibility wrapper constructor - the wrapper that is applied by default at make. If new_step_api=False, a warning is raised. This is independent of whether the core env is implemented in new or old api and only depends on the new_step_api argument.
    • [x] Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • [x] This change requires a documentation update

    Checklist:

    • [x] I have run the pre-commit checks with pre-commit run --all-files
    • [x] I have commented my code, particularly in hard-to-understand areas
    • [ ] I have made corresponding changes to the documentation (need to update with latest changes)
    • [x] My changes generate no new warnings (only intentional warnings)
    • [ ] I have added tests that prove my fix is effective or that my feature works (added two tests but maybe needs more to be comprehensive)
    • [x] New and existing unit tests pass locally with my changes
    • [x] Locally runs with atari / pybullet envs
    opened by arjun-kg 45
  • Use mujoco bindings instead of mujoco_py

    Use mujoco bindings instead of mujoco_py

    Changes made:

    • Create Viewer() class to render window in "human" mode with dm_viewer and glfw
    • Modified the default viewer_setup() method for all mujoco_environments (only for v3 envs)
    opened by rodrigodelazcano 45
  • Render API

    Render API

    New render API

    Following this discussion: #2540

    This PR extends the render() method, allowing the user to specify render_mode during the instantiation of the environment. The default value of render_mode is None; in that case, the user can call render with the preferred mode as before. In this way, these changes are backwards compatible. In case render_mode != None, the argument mode of .render() is ignored. In case render_mode = "human", the rendering is handled by the environment without needing to call .render(). With other render modes, .render() returns the results as before. We also introduce _list mode that returns a List with all the renders since the last .reset()or .render(). For example, with render_mode = "rgb_array_list", .render() returns a List of np.ndarray, while with render_mode = "ansi" a List[str].

    TODO

    • [x] Add deprecation warnings to mode arg in .render() and VideoRecorder

    Examples

    import gym
    
    env = gym.make('FrozenLake-v1', render_mode="human")
    env.reset()
    for _ in range(100):
        env.step(env.action_space.sample())
        # env renders automatically, no needs to call .render()
        
    env.render()
    > None
    
    import gym
    
    env = gym.make('CartPole-v1', render_mode="rgb_array_list")
    env.reset()
    
    for _ in range(10):
        env.step(env.action_space.sample())
    
    frames = env.render()
    type(frames)
    > <class 'list'>
    len(frames)
    > 11
    len(env.render()) # expect 0 because frames are popped by previous .render() call
    > 0
    
    env.reset()
    len(env.render())
    > 1
    

    Example of backward compatibility:

    import gym
    
    env = gym.make('FrozenLake-v1')  # default render_mode=None
    env.reset()
    for _ in range(100):
        # no rendering handled by the environment since render_mode = None
        env.step(env.action_space.sample()) 
        env.render()  # render with human mode (default)
       
    
    opened by younik 43
  • ImportError: sys.meta_path is None, Python is likely shutting down

    ImportError: sys.meta_path is None, Python is likely shutting down

    I'm using MacOS. Since the python script finished, it will print such errors:

    It's a script will cause this problem:

    import gym
    env = gym.make('SpaceInvaders-v0')
    env.reset()
    env.render()
    

    And after executing it, the error occurs:

    ~/G/qlearning $ python atari.py
    Exception ignored in: <bound method SimpleImageViewer.__del__ of <gym.envs.classic_control.rendering.SimpleImageViewer object at 0x1059ab400>>
    Traceback (most recent call last):
      File "/Users/louchenyao/anaconda3/lib/python3.6/site-packages/gym/envs/classic_control/rendering.py", line 347, in __del__
      File "/Users/louchenyao/anaconda3/lib/python3.6/site-packages/gym/envs/classic_control/rendering.py", line 343, in close
      File "/Users/louchenyao/anaconda3/lib/python3.6/site-packages/pyglet/window/cocoa/__init__.py", line 281, in close
      File "/Users/louchenyao/anaconda3/lib/python3.6/site-packages/pyglet/window/__init__.py", line 770, in close
    ImportError: sys.meta_path is None, Python is likely shutting down
    

    It doesn't affect the environment running. It just a little annoying.

    opened by louchenyao 38
  • Environment working, but not render()

    Environment working, but not render()

    Configuration:

    Dell XPS15 Anaconda 3.6 Python 3.5 NVIDIA GTX 1050

    I installed open ai gym through pip. When I run the below code, I can execute steps in the environment which returns all information of the specific environment, but the render() method just gives me a blank screen. When I exit python the blank screen closes in a normal way.

    Code:

    import gym
    env = gym.make('CartPole-v0')
    env.reset()
    env.render()
    for i in range(1000):
        env.step(env.action_space.sample())
    

    After hours of google searching I think the issue might have something to do with pyglet, the package used for rendering, and possibly a conflict with my nvidia graphics card? All help is welcome. Thanks!

    opened by cpatyn 38
  • Game window hangs up

    Game window hangs up

    Hi,

    I am a beginner with gym. After I render CartPole

    env = gym.make('CartPole-v0') env.reset() env.render()

    Window is launched from Jupyter notebook but it hangs immediately. Then the notebook is dead. I am using Python 3.5.4 on OSX 10.11.6. What could be the problem here?

    Thanks.

    opened by vishrathi 36
  • Support MuJoCo 1.5

    Support MuJoCo 1.5

    In order to use MuJoCo on a recent mac, you need to be using MuJoCo 1.5: https://github.com/openai/mujoco-py/issues/36 Otherwise you get:

    >>> import mujoco_py
    ERROR: Could not open disk
    

    This is because MuJoCo before 1.5 doesn't support NVMe disks.

    Gym depends on MuJoCo via mujoco-py, which just released support for MuJoCo 1.5.

    It looks like maybe this is something you're already working on? Or would it be useful for me to look into fixing it?

    opened by jeffkaufman 36
  • Update on Plans for the MuJoCo, Robotics and Box2d Environments and the Status of Brax and Hardware Accelerated Environments in Gym

    Update on Plans for the MuJoCo, Robotics and Box2d Environments and the Status of Brax and Hardware Accelerated Environments in Gym

    Given DeepMinds acquisition of MuJoCo and past discussions about replacing MuJoCo environments in Gym, I would like to clarify plans going forward after meeting with the Brax/PyBullet/TDS team at Google and the MuJoCo team at DeepMind.

    1. We are going to be replacing the documented MuJoCo environments of the "MuJoCo" class with Brax based environments in the "Phys3D" class, add a deprecation warning to the "MuJoCo" environments and move them to a separate deprecated repo some months later. This raises several questions -"Why do the MuJoCo environments have to be replaced?" Despite MuJoCo being free, right now, the Gym environments have numerous bugs in simulation configuration and have code in a state that we are not able to maintain them. Moreover, they all depend on MuJoCo-Py, which is now fully deprecated and cannot be reasonably maintained. Given this, to use the environments with the more updated free versions of MuJoCo, to fix bugs and to be able to continue do basic maintenance like using new Python versions, the environments would have to be very nearly rewritten from scratch. This means that a serious discussion of a change of simulator is appropriate. -"Of all the simulators available, why Brax?" First lets list the possible widely used options: PyBullet, MuJoCo, TDS and Brax. PyBullet, which was originally is the obvious choice, no longer seriously maintained in favor of TDS and Brax. Each simulators have pros and cons. TDS has full differentiability, Brax has accelerator support (the environments run on GPUs or TPUs allowing training to go orders of magnitude faster- e.g. full training in minutes), and PyBullet and MuJoCo are more physically accurate. For the "MuJoCo" environment class, this high level of physical accuracy is not necessary. Accordingly, picking newer simulators with extra feature of use to researchers (differentiability or hardware acceleration support) is likely the preferable option. I personally believe that hardware accelerator support is more important, hence choosing Brax. -"How long will this take?" We hope to have a release with the Brax based Phys3D environments within the next 5 weeks and a lot of progress has already been made, but it a definite date is difficult to say. For the most recent updates, see https://github.com/google/brax/issues/49

    2. The "Robotics" environments are being moved out of Gym. This in turn raises several questions: -"Why can't they be maintained as is?" These environments have the same problems with being unmaintainable and having serious bugs as the others in the "MuJoCo" class with hopper and so on do. -"Why can't these be rewritten in Brax like the others?" Brax not physically accurate enough to support such complex simulations well, and while they hope to support this in the future it will take a very long time. -"I use the Robotics environments, were are they going?" ~Into a repo maintained by @Rohan138 , unless someone who is capable of maintaining them to a higher level and wants to reaches out to me. They will still be maintained as best as is reasonably possible in their state, be installable, and be listed as third party environments in Gym.~ https://github.com/Farama-Foundation/gym-robotics -"Shouldn't Gym have robotics environments like this though? Why not rewrite them in a manner that's suitable?" Because I don't think Gym inherently should have them and because we can't. My goal is to make all the environments inside Gym itself good general purpose benchmarks suitable that someone new to the field of reinforcement learning can look at and say "okay, there's are the things everyone uses that I should play with and understand." The robotics environments after many years have never filled this role and have become niche environments specifically for HER and similar research, and while I cannot speak personally to this matter, the robotics researchers I've spoken to say that these environments are no longer widely used in this space, and that forks of them are used instead, which further means these should not live in Gym proper. Regarding why we can't, these would literally have to be rewritten in the new version of MuJoCo (as Pybullet is no longer extensively maintained) and it's new coming Python bindings (which will not be released publicly for many months, likely with Python bindings following later), and that's not something anyone I'm aware of is willing to do due to the utterly extraordinary amount of work required, including the MuJoCo team at Deepmind. -"When will this happen?" Whenever the next release of Gym comes out.

    3. The Box2D environments will be rewritten in Brax in a new Phys2D environment class, and the Box2D environments will be deprecated and then removed, similar to the MuJoCo environments. In this process, the duplicate versions of lunar lander and bipedal walker will be consolidated into one environment, with the different modes as arguments on creation. To answer the natural questions about this here as well: -"Why do they need to be rewritten?" This is discussed in https://github.com/openai/gym/issues/2358, but in a nutshell the physics engine they're using using (Box2D) has Python bindings that have not been maintained for years, meaning that they'll stop supporting new Python versions, architectures, and other basic maintenance things. After many discussions over months, I cannot get these bindings maintained by basically anyone. Additionally, using pyglet for rendering has been a source of continual problems for Gym and it does not reasonably support headless rendering (an essential feature). -"Why Brax?" Originally I was planning to use the other major 2D physics library (chipmunk, which has well maintained Python bindings), but Brax is orders of magnitude faster as it can run or accelerators and the Brax team is kind enough to be willing to do the replacements for us. -"When will this happen?" Probably a month after the Phys3D environments are merged at the current rate, but that's not a timeline people have committed to or anything.

    4. General questions: -"These Brax environments can still run on my CPU too, right?" Yep! -"Can Brax environments run on AMD GPUs?" With some effort, yes. Brax uses Jax, which uses XLA, which has solid beta support for most AMD GPUs. -"Why are you having Gym so heavily depend on Brax?" Because I think that it's the best option for environments that already need to be rewritten, and because I think that letting the benchmark environments run orders of magnitude faster via accelerators is of profound value to the community and to beginners in the field. -"Is Brax going to be maintained for the long term?" As long as we can realistically expect, yes. All software stands risk of deprecation, e.g. PyBullet, the Box2D Python bindings (and arguably Box2D itself), PIL (what came before pillow), and so on. Given what I've seen that Google is using it for internally, I'm very confident it will be maintained for at least 5 years or so if not longer, which I think is the best we can reasonably plan for. -"Are you going to make other environments hardware accelerated so they can similarly run orders of magnitude faster?" Hopefully! This could be done with the toy text environments and the classic control environments pretty easily through Jax. I have no concrete plans or timeline for this.

    Please let me know if anyone has additional questions about the transition here.

    opened by jkterry1 34
  • AttributeError: module 'gym' has no attribute 'make'

    AttributeError: module 'gym' has no attribute 'make'

    >>> import gym
    >>> env = gym.make('Copy-v0')
    Traceback (most recent call last):
      File "<pyshell#5>", line 1, in <module>
        env = gym.make('Copy-v0')
    AttributeError: module 'gym' has no attribute 'make'
    >>> 
    

    I wanna Why @jonasschneider

    opened by thomastao0215 34
  • atari Environment error

    atari Environment error

    An error appears when I try to run riverride in atari. The remaining atari games do not run with the same error:

    [ My code : import gym env = gym.make("Riverraid-v4",render_mode = 'human') env.reset() env.render() ]

    [error: File c:\Users\Jang min seock\AppData\Local\Programs\Python\Python38\lib\site-packages\gym\envs\registration.py:640, in make(id, max_episode_steps, autoreset, apply_api_compatibility, disable_env_checker, kwargs) 637 render_mode = None 639 try: --> 640 env = env_creator(_kwargs) 641 except TypeError as e: 642 if ( 643 str(e).find("got an unexpected keyword argument 'render_mode'") >= 0 644 and apply_human_rendering 645 ):

    File c:\Users\Jang min seock\AppData\Local\Programs\Python\Python38\lib\site-packages\ale_py\env\gym.py:155, in AtariEnv.init(self, game, mode, difficulty, obs_type, frameskip, repeat_action_probability, full_action_space, max_num_frames_per_episode, render_mode) 152 self.ale.setBool("sound", True) 154 # Seed + Load --> 155 self.seed() ... ---> 13 import libtorrent as lt 15 from typing import Dict 16 from collections import namedtuple

    ImportError: DLL load failed while importing libtorrent: The specified module was not found.. ]

    my gym version is 0.26.2

    opened by alstjr510 1
  • [Bug Report] Use of numpy bool8 is deprecated in newer versions of numpy

    [Bug Report] Use of numpy bool8 is deprecated in newer versions of numpy

    Describe the bug When using a newer version of numpy, this DeprecationWarning is shown:

    DeprecationWarning: `np.bool8` is a deprecated alias for `np.bool_`.  (Deprecated NumPy 1.24)
    

    Code example Using numpy 1.24:

    import gym
    
    env = gym.make('CartPole-v1')
    env.reset()
    env.step(0)
    

    System Info Describe the characteristic of your environment:

    • gym installed via pip install
    • OS: MacOS 12.6
    • Python: 3.10.8

    Checklist

    • [X] I have checked that there is no similar issue in the repo (required)
    opened by TheCleric 1
  • [Bug Report] Atari Environments do not expose render_mode

    [Bug Report] Atari Environments do not expose render_mode

    Describe the bug Atari environments do return none on accessing the render_mode attribute in an environment object. This only happens when

    Code example

    import gym
    env = gym.make('ALE/Pong-v5', render_mode='rgb_array')
    assert env.render_mode == 'rgb_array'  # this fails because env.render_mode returns None
    
    env = gym.make('CartPole-v0', render_mode='rgb_array')
    assert env.render_mode == 'rgb_array'  # passes assertion
    

    System Info M1 Mac, Environment created via conda (no Rosetta emulation, native arm64 wheels)

    conda create -n gym python=3.10 -y
    conda activate gym
    pip install 'gym[all]'
    

    Checklist

    • [x] I have checked that there is no similar issue in the repo (required)
    opened by benjaminbeilharz 1
  • Question : about

    Question : about "gym.vector.SyncVectorEnv"

    Hello,

    I have question regarding "gym.vector.SyncVectorEnv", in the documentation there are this explanation : "where the different copies of the environment are executed sequentially".

    My question is what do you mean by sequentially?, do you mean with every episode we excecute one env?

    Maybe to clarifiy my question, I explain what Iam trying to do:

    Iam trying to use DDPG (stable baseline3) to solve a problem.

    I would like to know, how can we change the env sampled values with every episode "and it should be reproducible"

    for example, assume we have an env where we harvest energy, we assume that the harvested energy is normally distributed, and then in every episode, I will sample DIFFERENT Values of my harvested energy.I would just like to emphasize again, that I would like that the different values of my harvested energy to be reproducible, so I can compare the RL method to other methods.

    PS: the customer env is already created where can I change the sampled value with every episode (using seed which gets as input the episode "i"), now my problem how can I fit it to the code using gym and stable baseline.

    I think "gym.vector.SyncVectorEnv" is my solution but Iam not sure.

    Thank you Best regards

    opened by Missourl 4
  • [Question] [Bug] Difference between env.action_space.seed(seed) vs env.action_space.np_random.seed(seed)

    [Question] [Bug] Difference between env.action_space.seed(seed) vs env.action_space.np_random.seed(seed)

    Question

    My gym version is 0.17.0. I create an Hopper-v2 environment. When I set seed of 10 using env.action_space.seed(seed), I get the following output:

    env.action_space.seed(10)
    [10]
    env.action_space.sample()
    array([ 0.18566807,  0.43384928, -0.9764525 ], dtype=float32)
    env.action_space.sample()
    array([0.5614053 , 0.0720133 , 0.56162614], dtype=float32)
    env.action_space.sample()
    array([-0.02573838,  0.19284548, -0.83452713], dtype=float32)
    

    Similarly, when I set seed of 10 using env.action_space.np_random.seed(seed), I get the following output:

    env.action_space.np_random.seed(10)
    env.action_space.sample()
    array([ 0.5426413 , -0.9584961 ,  0.26729646], dtype=float32)
    env.action_space.sample()
    array([ 0.49760777, -0.00298598, -0.5504067 ], dtype=float32)
    env.action_space.sample()
    array([-0.60387427,  0.5210614 , -0.66177833], dtype=float32)
    

    Why is there a difference between the action samples?

    opened by kbkartik 1
Releases(0.26.2)
  • 0.26.2(Oct 4, 2022)

    Release notes

    This is another very minor bug release.

    Bugs Fixes

    • As reset now returns (obs, info) then in the vector environments, this caused the final step's info to be overwritten. Now, the final observation and info are contained within the info as "final_observation" and "final_info" @pseudo-rnd-thoughts
    • Adds warnings when trying to render without specifying the render_mode @younik
    • Updates Atari Preprocessing such that the wrapper can be pickled @vermouth1992
    • Github CI was hardened to such that the CI just has read permissions @sashashura
    • Clarify and fix typo in GraphInstance @ekalosak
    Source code(tar.gz)
    Source code(zip)
  • 0.26.1(Sep 16, 2022)

    Release Notes

    This is a very minor bug fix release for 0.26.0

    Bug Fixes

    • #3072 - Previously mujoco was a necessary module even if only mujoco-py was used. This has been fixed to allow only mujoco-py to be installed and used. @YouJiacheng
    • #3076 - PixelObservationWrapper raises an exception if the env.render_mode is not specified. @vmoens
    • #3080 - Fixed bug in CarRacing where the colour of the wheels were not correct @foxik
    • #3083 - Fixed BipedalWalker where if the agent moved backwards then the rendered arrays would be a different size. @younik

    Spelling

    • Fixed truncation typo in readme API example @rdnfn
    • Updated pendulum observation space from angle to theta to make more consistent @ikamensh
    Source code(tar.gz)
    Source code(zip)
  • 0.26.0(Sep 6, 2022)

    Release notes for v0.26.0

    This release is aimed to be the last of the major API changes to the core API. All of the previously "turned off" changes of the base API (step termination / truncation, reset info, no seed function, render mode determined by initialization) are now expected by default. We still plan to make breaking changes to Gym itself, but to things that are very easy to upgrade (environments and wrappers), and things that aren't super commonly used (the vector API). Once those aspects are stabilized, we'll do a proper 1.0 release and follow semantic versioning. Additionally, unless something goes terribly wrong with this release and we have to release a patched version, this will be the last release of Gym for a while.

    If you've been waiting for a "stable" release of Gym to upgrade your project given all the changes that have been going on, this is the one.

    We also just wanted to say that we tremendously appreciate the communities patience with us as we've gone on this journey taking over the maintenance of Gym and making all of these huge changes to the core API. We appreciate your patience and support, but hopefully, all the changes from here on out will be much more minor.

    Breaking backward compatibility

    These changes are true of all gym's internal wrappers and environments but for environments not updated, we provide the EnvCompatibility wrapper for users to convert old gym v21 / 22 environments to the new core API. This wrapper can be easily applied in gym.make and gym.register through the apply_api_compatibility parameters.

    • Step Termination / truncation - The Env.step function returns 5 values instead of 4 previously (observations, reward, termination, truncation, info). A blog with more details will be released soon to explain this decision. @arjun-kg
    • Reset info - The Env.reset function returns two values (obs and info) with no return_info parameter for gym wrappers and environments. This is important for some environments that provided action masking information for each actions which was not possible for resets. @balisujohn
    • No Seed function - While Env.seed was a helpful function, this was almost solely used for the beginning of the episode and is added to gym.reset(seed=...). In addition, for several environments like Atari that utilise external random number generators, it was not possible to set the seed at any time other than reset. Therefore, seed is no longer expected to function within gym environments and is removed from all gym environments @balisujohn
    • Rendering - It is normal to only use a single render mode and to help open and close the rendering window, we have changed Env.render to not take any arguments and so all render arguments can be part of the environment's constructor i.e., gym.make("CartPole-v1", render_mode="human"). For more detail on the new API, see blog post @younik

    Major changes

    • Render modes - In v25, there was a change in the meaning of render modes, i.e. "rgb_array" returned a list of rendered frames with "single_rgb_array" returned a single frame. This has been reverted in this release with "rgb_array" having the same meaning as previously to return a single frame with a new mode "rgb_array_list" returning a list of RGB arrays. The capability to return a list of rendering observations achieved through a wrapper applied during gym.make. #3040 @pseudo-rnd-thoughts @younik
    • Added save_video that uses moviepy to render a list of RGB frames and updated RecordVideo to use this function. This removes support for recording ansi outputs. #3016 @younik
    • RandomNumberGenerator functions: rand, randn, randint, get_state, set_state, hash_seed, create_seed, _bigint_from_bytes and _int_list_from_bigint have been removed. @balisujohn
    • Bump ale-py to 0.8.0 which is compatibility with the new core API
    • Added EnvAPICompatibility wrapper @RedTachyon

    Minor changes

    • Added improved Sequence, Graph and Text sample masking @pseudo-rnd-thoughts
    • Improved the gym make and register type hinting with entry_point being a necessary parameter of register. #3041 @pseudo-rnd-thoughts
    • Changed all URL to the new gym website https://www.gymlibrary.dev/ @FieteO
    • Fixed mujoco offscreen rendering with weight and height value > 500 #3044 @YouJiacheng
    • Allowed toy_text environment to render on headless machines #3037 @RedTachyon
    • Renamed the motors in the mujoco swimmer envs #3036 @lin826
    Source code(tar.gz)
    Source code(zip)
  • 0.25.2(Aug 18, 2022)

    Release notes for v0.25.2

    This is a fairly minor bug fix release.

    Bug Fixes

    • Removes requirements for _TimeLimit.truncated in info for step compatibility functions. This makes the step compatible with Envpool @arjun-kg
    • As the ordering of Dict spaces matters when flattening spaces, updated the __eq__ to account for the .keys() ordering. @XuehaiPan
    • Allows CarRacing environment to be pickled. Updated all gym environments to be correctly pickled. @RedTachyon
    • seeding Dict and Tuple spaces with integers can cause lower-specification computers to hang due to requiring 8Gb memory. Updated the seeding with integers to not require unique subseeds (subseed collisions are rare). For users that require unique subseeds for all subspaces, we recommend using a dictionary or tuple with the subseeds. @olipinski
    • Fixed the metaclass implementation for the new render api to allow custom environments to use metaclasses as well. @YouJiacheng

    Updates

    • Simplifies the step compatibility functions to make them easier to debug. Time limit wrapper with the old step API favours terminated over truncated if both are true. This is as the old done step API can only encode 3 states (cannot encode terminated=True and truncated=True) therefore we must encode to only terminated=True or truncated=True. @pseudo-rnd-thoughts
    • Add Swig as a dependency @kir0ul
    • Add type annotation for render_mode and metadata @bkrl
    Source code(tar.gz)
    Source code(zip)
  • 0.25.1(Jul 26, 2022)

    Release notes

    • Added rendering for CliffWalking environment @younik
    • PixelObservationWrapper only supports the new render API due to difficulty in supporting both old and new APIs. A warning is raised if the user is using the old API @vmoens

    Bug fix

    • Revert an incorrect edition on wrapper.FrameStack @ZhiqingXiao
    • Fix reset bounds for mountain car @psc-g
    • Removed skipped tests causing bugs not to be caught @pseudo-rnd-thoughts
    • Added backward compatibility for environments without metadata @pseudo-rnd-thoughts
    • Fixed BipedalWalker rendering for RGB arrays @1b15
    • Fixed bug in PixelObsWrapper for using the new rendering @younik

    Typos

    • Rephrase observations' definition in Lunar Lander Environment @EvanMath
    • Top-docstring in gym/spaces/dict.py @Ice1187
    • Several typos in humanoidstandup_v4.py, mujoco_env.py, and vector_list_info.py @timgates42
    • Typos in passive environment checker @pseudo-rnd-thoughts
    • Typos in Swimmer rotations @lin826
    Source code(tar.gz)
    Source code(zip)
  • 0.25.0(Jul 13, 2022)

    Release notes

    This release finally introduces all new API changes that have been planned for the past year or more, all of which will be turned on by default in a subsequent release. After this point, Gym development should get massively smoother. This release also fixes large bugs present in 0.24.0 and 0.24.1, and we highly discourage using those releases.

    API Changes

    • Step - A majority of deep reinforcement learning algorithm implementations are incorrect due to an important difference in theory and practice as done is not equivalent to termination. As a result, we have modified the step function to return five values, obs, reward, termination, truncation, info. The full theoretical and practical reason (along with example code changes) for these changes will be explained in a soon-to-be-released blog post. The aim for the change to be backward compatible (for now), for issues, please put report the issue on github or the discord. @arjun-kg
    • Render - The render API is changed such that the mode has to be specified during gym.make with the keyword render_mode, after which, the render mode is fixed. For further details see https://younis.dev/blog/2022/render-api/ and https://github.com/openai/gym/pull/2671. This has the additional changes
      • with render_mode="human" you don't need to call .render(), rendering will happen automatically on env.step()
      • with render_mode="rgb_array", .render() pops the list of frames rendered since the last .reset()
      • with render_mode="single_rgb_array", .render() returns a single frame, like before.
    • Space.sample(mask=...) allows a mask when sampling actions to enable/disable certain actions from being randomly sampled. We recommend developers add this to the info parameter returned by reset(return_info=True) and step. See https://github.com/openai/gym/pull/2906 for example implementations of the masks or the individual spaces. We have added an example version of this in the taxi environment. @pseudo-rnd-thoughts
    • Add Graph for environments that use graph style observation or action spaces. Currently, the node and edge spaces can only be Box or Discrete spaces. @jjshoots
    • Add Text space for Reinforcement Learning that involves communication between agents and have dynamic length messages (otherwise MultiDiscrete can be used). @ryanrudes @pseudo-rnd-thoughts

    Bug fixes

    • Fixed car racing termination where if the agent finishes the final lap, then the environment ends through truncation not termination. This added a version bump to Car racing to v2 and removed Car racing discrete in favour of gym.make("CarRacing-v2", continuous=False) @araffin
    • In v0.24.0, opencv-python was an accidental requirement for the project. This has been reverted. @KexianShen @pseudo-rnd-thoughts
    • Updated utils.play such that if the environment specifies keys_to_action, the function will automatically use that data. @Markus28
    • When rendering the blackjack environment, fixed bug where rendering would change the dealers top car. @balisujohn
    • Updated mujoco docstring to reflect changes that were accidently overwritten. @Markus28

    Misc

    • The whole project is partially type hinted using pyright (none of the project files is ignored by the type hinter). @RedTachyon @pseudo-rnd-thoughts (Future work will add strict type hinting to the core API)
    • Action masking added to the taxi environment (no version bump due to being backwards compatible) @pseudo-rnd-thoughts
    • The Box space shape inference is allows high and low scalars to be automatically set to (1,) shape. Minor changes to identifying scalars. @pseudo-rnd-thoughts
    • Added option support in classic control environment to modify the bounds on the initial random state of the environment @psc-g
    • The RecordVideo wrapper is becoming deprecated with no support for TextEncoder with the new render API. The plan is to replace RecordVideo with a single function that will receive a list of frames from an environment and automatically render them as a video using MoviePy. @johnMinelli
    • The gym py.Dockerfile is optimised from 2Gb to 1.5Gb through a number of optimisations @TheDen
    Source code(tar.gz)
    Source code(zip)
  • 0.24.1(Jun 7, 2022)

    This is a bug fix release for version 0.24.0

    Bugs fixed:

    • Replaced the environment checker introduced in V24, such that the environment checker will not call step and reset during make. This new version is a wrapper that will observe the data that step and reset returns on their first call and check the data against the environment checker. @pseudo-rnd-thoughts
    • Fixed MuJoCo v4 arguments key callback, closing the environment in renderer and the mujoco_rendering close method. @rodrigodelazcano
    • Removed redundant warning in registration @RedTachyon
    • Removed maths operations from MuJoCo xml files @quagla
    • Added support for unpickling legacy spaces.Box @pseudo-rnd-thoughts
    • Fixed mujoco environment action and observation space docstring tables @pseudo-rnd-thoughts
    • Disable wrappers from accessing _np_random property and np_random is now forwarded to environments @pseudo-rnd-thoughts
    • Rewrite setup.py to add a "testing" meta dependency group @pseudo-rnd-thoughts
    • Fixed docstring in rescale_action wrapper @gianlucadecola
    Source code(tar.gz)
    Source code(zip)
  • 0.24.0(May 25, 2022)

    Major changes

    • Added v4 mujoco environments that use the new deepmind mujoco 2.2.0 module. This can be installed through pip install gym[mujoco] with the old bindings still being available using the v3 environments and pip install gym[mujoco-py]. These new v4 environment should have the same training curves as v3. For the Ant, we found that there was a contact parameter that was not applied in v3 that can enabled in v4 however was found to produce significantly worse performance see comment for more details. @Rodrigodelazcano
    • The vector environment step info API has been changes to allow hardware acceleration in the future. See this PR for the modified info style that now uses dictionaries instead of a list of environment info. If you still wish to use the list info style, then use the VectorListInfo wrapper. @gianlucadecola
    • On gym.make, the gym env_checker is run that includes calling the environment reset and step to check if the environment is compliant to the gym API. To disable this feature, run gym.make(..., disable_env_checker=True). @RedTachyon
    • Re-added gym.make("MODULE:ENV") import style that was accidentally removed in v0.22 @arjun-kg
    • Env.render is now order enforced such that Env.reset is required before Env.render is called. If this a required feature then set the OrderEnforcer wrapper disable_render_order_enforcing=True. @pseudo-rnd-thoughts
    • Added wind and turbulence to the Lunar Lander environment, this is by default turned off, use the wind_power and turbulence parameter. @virgilt
    • Improved the play function to allows multiple keyboard letter to pass instead of ascii value @Markus28
    • Added google style pydoc strings for most of the repositories @pseudo-rnd-thoughts @Markus28
    • Added discrete car racing environment version through gym.make("CarRacing-v1", continuous=False)
    • Pygame is now an optional module for box2d and classic control environments that is only necessary for rendering. Therefore, install pygame using pip install gym[box2d] or pip install gym[classic_control] @gianlucadecola @RedTachyon
    • Fixed bug in batch spaces (used in VectorEnv) such that the original space's seed was ignored @pseudo-rnd-thoughts
    • Added AutoResetWrapper that automatically calls Env.reset when Env.step done is True @balisujohn

    Minor changes

    • BipedalWalker and LunarLander's observation spaces have non-infinite upper and lower bounds. @jjshoots
    • Bumped the ALE-py version to 0.7.5
    • Improved the performance of car racing through not rendering polygons off screen @andrewtanJS
    • Fixed turn indicators that were black not red/white in Car racing @jjshoots
    • Bug fixes for VecEnvWrapper to forward method calls to the environment @arjun-kg
    • Removed unnecessary try except on Box2d such that if Box2d is not installed correctly then a more helpful error is show @pseudo-rnd-thoughts
    • Simplified the gym.registry backend @RedTachyon
    • Re-added python 3.6 support through backports of python 3.7+ modules. This is not tested or compatible with the mujoco environments. @pseudo-rnd-thoughts
    Source code(tar.gz)
    Source code(zip)
  • 0.23.1(Mar 11, 2022)

    This release contains a few small bug fixes and no breaking changes.

    • Make VideoRecorder backward-compatible to gym<0.23 by @vwxyzjn in https://github.com/openai/gym/pull/2678
    • Fix issues with pygame event handling (which should fix support on windows and in jupyter notebooks) by @andrewtanJS in https://github.com/openai/gym/pull/2684
    • Add py.typed to package_data by @micimize in https://github.com/openai/gym/p
    • Fixes around 1500 warnings in CI @pseudo-rnd-thoughts
    • Deprecation warnings correctly display now @vwxyzjn
    • Fix removing striker and thrower @RushivArora
    • Fix small dependency warning errorr @ZhiqingXiao
    Source code(tar.gz)
    Source code(zip)
  • 0.23.0(Mar 4, 2022)

    This release contains many bug fixes and a few small changes.

    Breaking changes:

    • Standardized render metadata variables ahead of render breaking change @trigaten
    • Removed deprecated monitor wrapper and associated dead code @gianlucadecola
    • Unused striker and thrower MuJoCo envs moved to https://github.com/RushivArora/Gym-Mujoco-Archive @RushivArora

    Many minor bug fixes (@vwxyzjn , @RedTachyon , @rusu24edward , @Markus28 , @dsctt , @andrewtanJS , @tristandeleu , @duburcqa)

    Source code(tar.gz)
    Source code(zip)
  • 0.22.0(Feb 17, 2022)

    This release represents the largest set of changes ever to Gym, and represents a huge step towards the plans for 1.0 outlined here: https://github.com/openai/gym/issues/2524

    Gym now has a new comprehensive documentation site: https://www.gymlibrary.ml/ !

    API changes:

    -env.reset now accepts three new arguments:

    options- Usable for things like controlling curriculum learning without reinitializing the environment, which can be expensive (@RedTachyon) seed- Environment seeds can be passed to this reset argument in the future. The old .seed() method is being deprecated in favor of this, though it will continue to function as before until the 1.0 release for backwards compatibility purposes (@RedTachyon) infos- when set to True, reset will return obs, info. This currently defaults to False, but will become the default behavior in Gym 1.0 (@RedTachyon)

    -Environment names no longer require a version during registration and will suggest intelligent similar names (@kir0ul, @JesseFarebro)

    -Vector environments now support terminal_observation in info and support batch action spaces (@vwxyzjn, @tristandeleu)

    Environment changes: -The blackjack and frozen lake toy_text environments now have nice graphical rendering using PyGame (@1b15) -Moved robotics environments to gym-robotics package (@seungjaeryanlee, @Rohan138, @vwxyzjn) (per discussion in https://github.com/openai/gym/issues/2456#issue-1032765998) -The bipedal walker and lunar lander environments were consolidated into one class (@andrewtanJS) -Atari environments now use standard seeding API (@JesseFarebro) -Fixed large bug fixes in car_racing box2d environment, bumped version (@carlosluis, @araffin) -Refactored all box2d and classic_control environments to use PyGame instead of Pyglet as issues with pyglet has been one of the most frequent sources of GitHub issues over the life of the gym project (@andrewtanJS)

    Other changes: -Removed DiscreteEnv class, built in environments no longer use it (@carlosluis) -Large numbers of type hints added (@ikamensh, @RedTachyon) -Python 3.10 support -Tons of additional code refactoring, cleanup, error message improvements and small bug fixes (@vwxyzjn, @Markus28, @RushivArora, @jjshoots, @XuehaiPan, @Rohan138, @JesseFarebro, @Ericonaldo, @AdilZouitine, @RedTachyon) -All environment files now have dramatically improved readmes at the top (that the documentation website automatically pulls from) -As part of the seeding changes, Gym's RNG has been modified to use the np.random.Generator as the RandomState API has been deprecated. The methods randint, rand, randn are replaced by integers, random and standard_normal respectively. As a consequence, the random number generator has changed from MT19937 to PCG64.

    Source code(tar.gz)
    Source code(zip)
  • v0.21.0(Oct 2, 2021)

    -The old Atari entry point that was broken with the last release and the upgrade to ALE-Py is fixed (@JesseFarebro) -Atari environments now give much clearer error messages and warnings (@JesseFarebro) -A new plugin system to enable an easier inclusion of third party environments has been added (@JesseFarebro) -Atari environments now use the new plugin system to prevent clobbered names and other issues (@JesseFarebro) -pip install gym[atari] no longer distributes Atari ROMs that the ALE (the Atari emulator used) needs to run the various games. The easiest way to install ROMs into the ALE has been to use AutoROM. Gym now has a hook to AutoROM for easier CI automation so that using pip install gym[accept-rom-license] calls AutoROM to add ROMs to the ALE. You can install the entire suite with the shorthand gym[atari, accept-rom-license]. Note that as described in the name name, by installing gym[accept-rom-license] you are confirming that you have the relevant license to install the ROMs. (@JesseFarebro) -An accidental breaking change when loading saved policies trained on old versions of Gym with environments using the box action space have been fixed. (@RedTachyon) -Pendulum has had a minor fix to it's physics logic made and the version has been bumped to v1 (@RedTachyon) -Tests have been refactored into an orderly manner (@RedTachyon) -Dict spaces now have standard dict helper methods (@Rohan138) -Environment properties are now forwarded to the wrapper (@tristandeleu) -Gym now properly enforces calling reset before stepping for the first time (@ahmedo42) -Proper piping of error messages to stderr (@XuehaiPan) -Fix video saving issues (@zlig)

    Also, Gym is compiling a list of third party environments to into the new documentation website we're working on. Please submit PRs for ones that are missing: https://github.com/openai/gym/blob/master/docs/third_party_environments.md

    Source code(tar.gz)
    Source code(zip)
  • v0.20.0(Sep 14, 2021)

    Major Change:

    • Replaced Atari-Py dependency with ALE-Py and bumped all versions. This is a massive upgrade with many changes, please see the full explainer (@JesseFarebro)
    • Note that ALE-Py does not include ROMs. You can install ROMs in two lines of bash with AutoROM though (pip3 install autorom and then autorom), see https://github.com/PettingZoo-Team/AutoROM. This is the recommended approach for CI, etc.

    Breaking changes and new features:

    • Add RecordVideo wrapper, deprecate monitor wrapper in favor of it and RecordEpisodeStatistics wrapper (@vwxyzjn)
    • Dependencies used outside of environments (e.g. for wrappers) are now in 'other' extra' (@jkterry1)
    • Moved algorithmic and unused toytext envs (guessing game, hotter colder, nchain, roulette, kellycoinflip) to third party repos (@jkterry1, @Rohan138)
    • Fixed flatten utility and flatdim in MultiDiscrete sapce (@tristandeleu)
    • Add __setitem__ to dict space (@jfpettit)
    • Large fixes to .contains method for box space (@FirefoxMetzger)
    • Made blackjack environment properly comply with Barto and Sutton book standard, bumped to v1 (@RedTachyon)
    • Added NormalizeObservation and NormalizeReward wrappers (@vwxyzjn)
    • Add __getitem__ and __len__ to MultiDiscrete space (@XuehaiPan)
    • Changed .shape to be a property of box space to prevent unexpected behaviors (@RedTachyon)

    Bug fixes and upgrades:

    • Video recorder gracefully handles closing (@XuehaiPan)
    • Remaining unnecessary dependencies in setup.py are resolved (@jkterry1)
    • Minor acrobot performance improvements (@TuckerBMorgan)
    • Pendulum properly renders when 0 force is sent (@Olimoyo)
    • Make observations dtypes be consistent with observation space dtypes for all classic control envs and bipedalwalker (@RedTachyon)
    • Removed unused and long depricated features in registration (@Rohan138)
    • Framestack wrapper now inherits from obswrapper (@jfpettit)
    • Seed method for spaces.Tuple and spaces.Dict now properly function, are fully stochastic, are fully featured and behave in the expected manner (@XuehaiPan, @RaghuSpaceRajan)
    • Replace time() with perf_counter() for better measurements of short duration (@zuoxingdong)
    Source code(tar.gz)
    Source code(zip)
  • 0.19.0(Aug 13, 2021)

    Gym 0.19.0 is a large maintenance release, and the first since @jkterry1 became the maintainer. There should be no breaking changes in this release.

    New features:

    • Added custom datatype argument to multidiscrete space (@m-orsini)
    • API compliance test added based on SB3 and PettingZoo tests (@amtamasi)
    • RecordEpisodeStatics works with VectorEnv (@vwxyzjn)

    Bug fixes:

    • Removed unused dependencies, removed unnescesary dependency version requirements that caused installation issues on newer machines, added full requirements.txt and moved general dependencies to extras. Notably, "toy_text" is not a used extra. atari-py is now pegged to a precise working version pending the switch to ale-py (@jkterry1)
    • Bug fixes to rewards in FrozenLake and FrozenLake8x8; versions bumped to v1 (@ZhiqingXiao) -Removed remaining numpy depreciation warnings (@super-pirata)
    • Fixes to video recording (@mahiuchun, @zlig)
    • EZ pickle argument fixes (@zzyunzhi, @Indoril007)
    • Other very minor (nonbreaking) fixes

    Other:

    • Removed small bits of dead code (@jkterry1)
    • Numerous typo, CI and documentation fixes (mostly @cclauss)
    • New readme and updated third party env list (@jkterry1)
    • Code is now all flake8 compliant through black (@cclauss)
    Source code(tar.gz)
    Source code(zip)
  • v0.9.6(Feb 1, 2018)

    • Now your Env and Wrapper subclasses should define step, reset, render, close, seed rather than underscored method names.
    • Removed the board_game, debugging, safety, parameter_tuning environments since they're not being maintained by us at OpenAI. We encourage authors and users to create new repositories for these environments.
    • Changed MultiDiscrete action space to range from [0, ..., n-1] rather than [a, ..., b-1].
    • No more render(close=True), use env-specific methods to close the rendering.
    • Removed scoreboard directory, since site doesn't exist anymore.
    • Moved gym/monitoring to gym/wrappers/monitoring
    • Add dtype to Space.
    • Not using python's built-in module anymore, using gym.logger
    Source code(tar.gz)
    Source code(zip)
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning. TF-Agents makes implementing, de

2.4k Dec 29, 2022
TensorFlow Reinforcement Learning

TRFL TRFL (pronounced "truffle") is a library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Le

DeepMind 3.1k Dec 29, 2022
Open world survival environment for reinforcement learning

Crafter Open world survival environment for reinforcement learning. Highlights Crafter is a procedurally generated 2D world, where the agent finds foo

Danijar Hafner 213 Jan 05, 2023
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.

Paddle-RLBooks Welcome to Paddle-RLBooks which is a reinforcement learning code study guide based on pure PaddlePaddle. 欢迎来到Paddle-RLBooks,该仓库主要是针对强化学

AgentMaker 117 Dec 12, 2022
Monitor your el-cheapo UPS via SNMP

UPSC-SNMP-Agent UPSC-SNMP-Agent exposes your el-cheapo locally connected UPS via the SNMP network management protocol. This enables various equipment

Tom Szilagyi 32 Jul 28, 2022
ChainerRL is a deep reinforcement learning library built on top of Chainer.

ChainerRL ChainerRL is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using Ch

Chainer 1.1k Dec 26, 2022
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grok

Google 10k Jan 07, 2023
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

Applied Reinforcement Learning @ Facebook Overview ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and

Facebook Research 3.3k Jan 05, 2023
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Status: Maintenance (expect bug fixes and minor updates) Baselines OpenAI Baselines is a set of high-quality implementations of reinforcement learning

OpenAI 13.5k Jan 07, 2023
A general-purpose multi-agent training framework.

MALib A general-purpose multi-agent training framework. Installation step1: build environment conda create -n malib python==3.7 -y conda activate mali

MARL @ SJTU 346 Jan 03, 2023
A toolkit for reproducible reinforcement learning research.

garage garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementa

Reinforcement Learning Working Group 1.6k Jan 09, 2023
This is the official implementation of Multi-Agent PPO.

MAPPO Chao Yu*, Akash Velu*, Eugene Vinitsky, Yu Wang, Alexandre Bayen, and Yi Wu. Website: https://sites.google.com/view/mappo This repository implem

653 Jan 06, 2023
Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Coach Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms. It exposes a set of easy-to-us

Intel Labs 2.2k Jan 05, 2023
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

SLM Lab Modular Deep Reinforcement Learning framework in PyTorch. Documentation: https://slm-lab.gitbook.io/slm-lab/ BeamRider Breakout KungFuMaster M

Wah Loon Keng 1.1k Dec 24, 2022
Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. :godmode:

ViZDoom ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research

Marek Wydmuch 1.5k Dec 30, 2022
Deep Reinforcement Learning for Keras.

Deep Reinforcement Learning for Keras What is it? keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seaml

Keras-RL 5.4k Jan 04, 2023
Tensorforce: a TensorFlow library for applied reinforcement learning

Tensorforce: a TensorFlow library for applied reinforcement learning Introduction Tensorforce is an open-source deep reinforcement learning framework,

Tensorforce 3.2k Jan 02, 2023
A customisable 3D platform for agent-based AI research

DeepMind Lab is a 3D learning environment based on id Software's Quake III Arena via ioquake3 and other open source software. DeepMind Lab provides a

DeepMind 6.8k Jan 05, 2023
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

MARL Tricks Our codes for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implemented and standardiz

404 Dec 25, 2022