Deep Reinforcement Learning for Keras.

Overview

Deep Reinforcement Learning for Keras

Build Status Documentation License Join the chat at https://gitter.im/keras-rl/Lobby

What is it?

keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seamlessly integrates with the deep learning library Keras.

Furthermore, keras-rl works with OpenAI Gym out of the box. This means that evaluating and playing around with different algorithms is easy.

Of course you can extend keras-rl according to your own needs. You can use built-in Keras callbacks and metrics or define your own. Even more so, it is easy to implement your own environments and even algorithms by simply extending some simple abstract classes. Documentation is available online.

What is included?

As of today, the following algorithms have been implemented:

  • Deep Q Learning (DQN) [1], [2]
  • Double DQN [3]
  • Deep Deterministic Policy Gradient (DDPG) [4]
  • Continuous DQN (CDQN or NAF) [6]
  • Cross-Entropy Method (CEM) [7], [8]
  • Dueling network DQN (Dueling DQN) [9]
  • Deep SARSA [10]
  • Asynchronous Advantage Actor-Critic (A3C) [5]
  • Proximal Policy Optimization Algorithms (PPO) [11]

You can find more information on each agent in the doc.

Installation

  • Install Keras-RL from Pypi (recommended):
pip install keras-rl
  • Install from Github source:
git clone https://github.com/keras-rl/keras-rl.git
cd keras-rl
python setup.py install

Examples

If you want to run the examples, you'll also have to install:

For atari example you will also need:

  • Pillow: pip install Pillow
  • gym[atari]: Atari module for gym. Use pip install gym[atari]

Once you have installed everything, you can try out a simple example:

python examples/dqn_cartpole.py

This is a very simple example and it should converge relatively quickly, so it's a great way to get started! It also visualizes the game during training, so you can watch it learn. How cool is that?

Some sample weights are available on keras-rl-weights.

If you have questions or problems, please file an issue or, even better, fix the problem yourself and submit a pull request!

External Projects

You're using Keras-RL on a project? Open a PR and share it!

Visualizing Training Metrics

To see graphs of your training progress and compare across runs, run pip install wandb and add the WandbLogger callback to your agent's fit() call:

from rl.callbacks import WandbLogger

...

agent.fit(env, nb_steps=50000, callbacks=[WandbLogger()])

For more info and options, see the W&B docs.

Citing

If you use keras-rl in your research, you can cite it as follows:

@misc{plappert2016kerasrl,
    author = {Matthias Plappert},
    title = {keras-rl},
    year = {2016},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/keras-rl/keras-rl}},
}

References

  1. Playing Atari with Deep Reinforcement Learning, Mnih et al., 2013
  2. Human-level control through deep reinforcement learning, Mnih et al., 2015
  3. Deep Reinforcement Learning with Double Q-learning, van Hasselt et al., 2015
  4. Continuous control with deep reinforcement learning, Lillicrap et al., 2015
  5. Asynchronous Methods for Deep Reinforcement Learning, Mnih et al., 2016
  6. Continuous Deep Q-Learning with Model-based Acceleration, Gu et al., 2016
  7. Learning Tetris Using the Noisy Cross-Entropy Method, Szita et al., 2006
  8. Deep Reinforcement Learning (MLSS lecture notes), Schulman, 2016
  9. Dueling Network Architectures for Deep Reinforcement Learning, Wang et al., 2016
  10. Reinforcement learning: An introduction, Sutton and Barto, 2011
  11. Proximal Policy Optimization Algorithms, Schulman et al., 2017
Comments
  • Resurrecting Keras-RL

    Resurrecting Keras-RL

    #118 is keras-rl dead
    @ViktorM I think Keras-RL is one of the best Keras libraries around and is brilliantly structured. Most of the codes I've read are stand-alone i.e. the researcher implements one model and the code is specific to that only. Keras-RL gives structure and is quite extensible.

    I understand the author @matthiasplappert is working for OpenAI (congrats!) and would be quite busy. It'd be nice if we all could get together and maintain this repo. I'd be happy and eager to contribute but to be honest I don't have much experience with maintaining large open source codes.

    I'm currently trying to fix the author's DRQN implementation from Dec 2016 to work with current version of Keras-RL/Keras. I've got a decent understanding of how the library has been written. If anyone else has already done it, please let me know, could really use the help. Thanks

    opened by arsenious 22
  • CDQN nan actions

    CDQN nan actions

    I just ported the CDQN pendulum agent to an environment of mine. When I run the model, the first few steps contain valid values but the rest are nan. I am not sure what is up here. Let me know what I can provide to help debug.

    > python .\dqn.py -d C:\Users\Ryan\Dropbox\cmu-sf\deepsf-data2 --visualize
    Using Theano backend.
    ____________________________________________________________________________________________________
    Layer (type)                     Output Shape          Param #     Connected to
    ====================================================================================================
    flatten_1 (Flatten)              (None, 60)            0           flatten_input_1[0][0]
    ____________________________________________________________________________________________________
    dense_1 (Dense)                  (None, 16)            976         flatten_1[0][0]
    ____________________________________________________________________________________________________
    activation_1 (Activation)        (None, 16)            0           dense_1[0][0]
    ____________________________________________________________________________________________________
    dense_2 (Dense)                  (None, 16)            272         activation_1[0][0]
    ____________________________________________________________________________________________________
    activation_2 (Activation)        (None, 16)            0           dense_2[0][0]
    ____________________________________________________________________________________________________
    dense_3 (Dense)                  (None, 16)            272         activation_2[0][0]
    ____________________________________________________________________________________________________
    activation_3 (Activation)        (None, 16)            0           dense_3[0][0]
    ____________________________________________________________________________________________________
    dense_4 (Dense)                  (None, 1)             17          activation_3[0][0]
    ____________________________________________________________________________________________________
    activation_4 (Activation)        (None, 1)             0           dense_4[0][0]
    ====================================================================================================
    Total params: 1537
    ____________________________________________________________________________________________________
    None
    ____________________________________________________________________________________________________
    Layer (type)                     Output Shape          Param #     Connected to
    ====================================================================================================
    flatten_2 (Flatten)              (None, 60)            0           flatten_input_2[0][0]
    ____________________________________________________________________________________________________
    dense_5 (Dense)                  (None, 16)            976         flatten_2[0][0]
    ____________________________________________________________________________________________________
    activation_5 (Activation)        (None, 16)            0           dense_5[0][0]
    ____________________________________________________________________________________________________
    dense_6 (Dense)                  (None, 16)            272         activation_5[0][0]
    ____________________________________________________________________________________________________
    activation_6 (Activation)        (None, 16)            0           dense_6[0][0]
    ____________________________________________________________________________________________________
    dense_7 (Dense)                  (None, 16)            272         activation_6[0][0]
    ____________________________________________________________________________________________________
    activation_7 (Activation)        (None, 16)            0           dense_7[0][0]
    ____________________________________________________________________________________________________
    dense_8 (Dense)                  (None, 2L)            34          activation_7[0][0]
    ____________________________________________________________________________________________________
    activation_8 (Activation)        (None, 2L)            0           dense_8[0][0]
    ====================================================================================================
    Total params: 1554
    ____________________________________________________________________________________________________
    None
    ____________________________________________________________________________________________________
    Layer (type)                     Output Shape          Param #     Connected to
    ====================================================================================================
    observation_input (InputLayer)   (None, 1, 60L)        0
    ____________________________________________________________________________________________________
    action_input (InputLayer)        (None, 2L)            0
    ____________________________________________________________________________________________________
    flatten_3 (Flatten)              (None, 60)            0           observation_input[0][0]
    ____________________________________________________________________________________________________
    merge_1 (Merge)                  (None, 62)            0           action_input[0][0]
                                                                       flatten_3[0][0]
    ____________________________________________________________________________________________________
    dense_9 (Dense)                  (None, 32)            2016        merge_1[0][0]
    ____________________________________________________________________________________________________
    activation_9 (Activation)        (None, 32)            0           dense_9[0][0]
    ____________________________________________________________________________________________________
    dense_10 (Dense)                 (None, 32)            1056        activation_9[0][0]
    ____________________________________________________________________________________________________
    activation_10 (Activation)       (None, 32)            0           dense_10[0][0]
    ____________________________________________________________________________________________________
    dense_11 (Dense)                 (None, 32)            1056        activation_10[0][0]
    ____________________________________________________________________________________________________
    activation_11 (Activation)       (None, 32)            0           dense_11[0][0]
    ____________________________________________________________________________________________________
    dense_12 (Dense)                 (None, 3L)            99          activation_11[0][0]
    ____________________________________________________________________________________________________
    activation_12 (Activation)       (None, 3L)            0           dense_12[0][0]
    ====================================================================================================
    Total params: 4227
    ____________________________________________________________________________________________________
    None
    Training for 21820000 steps ...
    [-29.43209839  41.64512253]
    [-26.13952446  42.74395752]
    [-29.95537758  54.30570602]
    [-28.84783554  35.84109497]
    [-26.03454971  31.98110199]
    [ nan  nan]
    [ nan  nan]
    [ nan  nan]
    [ nan  nan]
    [ nan  nan]
    [ nan  nan]
    [ nan  nan]
    [ nan  nan]
    [ nan  nan]
    [ nan  nan]
    [ nan  nan]
    [ nan  nan]
    
    wontfix 
    opened by RyanHope 22
  • Not working with PongDeterministic-v4

    Not working with PongDeterministic-v4

    I tried running the code for PongDeterministic-v4 for 1.75 M steps but there was no improvement in score. It started with avg -21 but even after 1.75 M step the avg score was around -20.4. In the last testing phase too - the score was -21.0 throughout.

    Any help with issue will be really appreciated.

    Thanks

    wontfix 
    opened by damnOblivious 21
  • LSTM input layer possible?

    LSTM input layer possible?

    Is there a theoretical reason why LSTM couldn't be used as an input layer to a DQN model or is it just a limitation of the implementation? I have been working with an environment where the input could be treated as a sequence of states but up until now I have just been flattening this input. I have a supervised learning keras model with an LSTM input layer that is able to interact with my environment, but when I try using the same model in the DQN agent I get an error about my input dimensions.

    wontfix 
    opened by RyanHope 20
  • len is not well defined for symbolic tensors *AND* using a `tf.Tensor` as a Python `bool` is not allowed in Graph execution

    len is not well defined for symbolic tensors *AND* using a `tf.Tensor` as a Python `bool` is not allowed in Graph execution

    There is an error in the error-handling of the ddpg.py and dqn.py agents in keras-rl/rl/agents while using Tensorflow 2.0, Keras 2.3.1, Keras-rl 0.4.2 dqn.py line 108 ddpg.py line 29, 31 It comes about by calling len(model.output)

    Error:

    Traceback (most recent call last):
      File "foo.py", line x, in <module>
        agent = DDPGAgent(...)
      **File "foo\ddpg.py", line 29, in __init__
        if hasattr(actor.output, '__len__') and len(actor.output) > 1:**
      File "foo\ops.py", line 741, in __len__
        "shape information.".format(self.name))
    TypeError: len is not well defined for symbolic Tensors. (dense_5/BiasAdd:0) Please call `x.shape` rather than `len(x)` for shape information.
    

    Possible solution: What I've been using as a fix is summing the shape of the tensor: sum(x is not None for x in model.output.shape) Implementation example: if hasattr(model.output, '__len__') and sum(x is not None for x in model.output.shape) > 1:

    There is then another error in ddpg.py at line 130: if i == self.critic_action_input: Error: tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: using atf.Tensoras a Pythonboolis not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function. Including

    import tensorflow as tf
    tf.compat.v1.enable_eager_execution()
    

    Does not seem to help, I have also tried creating a session and using tf.equal(i, self.critic_action_input).eval(session=sess) but I'm having issues, as of now I've tried

    import tensorflow as tf
    
    with tf.compat.v1.Session(graph=self.critic_action_input.graph) as sess:
                for i in self.critic.
                    if tf.equal(i, self.critic_action_input).eval(session=sess): #if i == self.critic_action_input:
                        combined_inputs.append([])
                    else:
                        combined_inputs.append(i)
                        critic_inputs.append(i)
    

    But I cannot get it to work

    Thank you

    wontfix 
    opened by EduardDurech 16
  • Prioritized Experience Replay

    Prioritized Experience Replay

    An implementation of Prioritized Experience Replay, Schaul et al., 2016.

    Key features/changes:

    dqn.py: Adds logic to determine if the memory is Prioritized, and adjusts the algorithm accordingly. I've run a few tests and things seem to be working, but I'd appreciate some help looking over the math.

    memory.py: Adds PrioritizedMemory, which samples experiences proportional to their TD error. Uses similar conventions to SequentialMemory.

    util.py: Adds segment tree data structure used by PrioritizedMemory, which is taken almost directly from the OpenAI baseline version.

    dqn_atari.py: Modifies the dqn example to include PrioritizedMemory (Sequential is still there but commented out). Also changes the keras model to the functional api, as well as updates the Convolution2D syntax (no more deprecation warnings).

    docs/dqn.md: Adds the PER paper to the references.

    I've done a handful of 8 million + time step training runs as tests (on Breakout v0 and v4). Learning went smoothly. Unfortunately, there was a bug that prevented the memory's beta value from annealing properly, so I can't say for sure that it's working. That's been fixed and I'm starting a new 10 mil run. As of 2.6m everything is still going well so I think this is good enough to put up for review; I'll comment the results when it's complete.

    opened by jakegrigsby 16
  • Why custom ringbuffer used instead of deque?

    Why custom ringbuffer used instead of deque?

    https://github.com/matthiasplappert/keras-rl/blob/d9e3b64a20f056030c02bfe217085b7e54098e48/rl/memory.py#L42 states: Do not use deque to implement the memory. This data structure may seem convenient but it is way too slow on random access. Instead, we use our own ring buffer implementation.

    Not sure why this is stated. I ran this quick test: ` ring = RingBuffer(maxlen = 1000)

    start_time = time.time()
    for i in range(100000):
        ring.append(i)
    print("--- %s seconds ---" % (time.time() - start_time))
    
    d = deque(maxlen=1000)
    start_time = time.time()
    for i in range(100000):
        d.append(i)
    print("--- %s seconds ---" % (time.time() - start_time))
    
    start_time = time.time()
    for i in range(100000):
        l = ring[i%1000-1]
    print("--- %s seconds ---" % (time.time() - start_time))
    
    start_time = time.time()
    for i in range(100000):
        l = d[i%1000]
    print("--- %s seconds ---" % (time.time() - start_time))
    

    and got this result: --- 0.0608789920807 seconds --- --- 0.00712609291077 seconds --- --- 0.0430929660797 seconds --- --- 0.00617289543152 seconds --- `

    You can see that the custom ringbuffer is significantly slower than the deque at adding or removing. Can someone explain to me what the comment is referring to exactly and why the ringbuffer is used?

    wontfix 
    opened by kirkscheper 16
  • How to train agent to play my own (2048) game?

    How to train agent to play my own (2048) game?

    Hello,

    I am working on AI for the game 2048 (https://gabrielecirulli.github.io/2048/). This library looks like fantastic so I want to ask you what I need to do in order to train agent (some discrete one, probably DQN)? I have implemented 2048 using NumPy (https://github.com/gorgitko/pygames/tree/master/2048), so I should implement some API for your library? I see in your examples that you are training the OpenAI Gym games so I just need to implement same API in my 2048 game class?

    Thank you!

    opened by gorgitko 16
  • How to use continiously?

    How to use continiously?

    I understand how to use the keras-rl framework in a limited train / test workflow as demonstrated in some of the samples.

    But, how would one implement keras-rl in a scenario where one wants to deploy it into a real environment, where training starts from scratch and improves over time? Should I call "fit" and let in run for an episode and then persist the weights and then start over again? Or, how would the flow look like?

    wontfix 
    opened by olavt 14
  • Unable to learn simple catch game

    Unable to learn simple catch game

    I've made custom environment, where the fruit is falling and you control a paddle to catch it: https://github.com/hmate9/gym-catch/blob/master/gym_catch/envs/catch_env.py

    I've tried to use keras-rl to reimplement this: https://gist.github.com/EderSantana/c7222daa328f0e885093

    The same game, catching a fruit, and their implementation finds a good model in a couple of minutes which catches nearly 100% of the time.

    Here is the code for learning with keras-rl that I wrote: https://gist.github.com/hmate9/49758ee1117ae55616f45d72186834a5

    The code with keras-rl does not converge, and does not even seem to be better than random even after running for hours.

    Does anyone know why this is? Did I write the environment wrong or am I using keras-rl wrong?

    Your answer is greatly appreciated, I have not been able to solve this for a day now.

    opened by hmate9 13
  • Please help me, I have a problem with DQNAgent.

    Please help me, I have a problem with DQNAgent.

    TypeError Traceback (most recent call last) in () 1 policy = EpsGreedyQPolicy() 2 memory = SequentialMemory(limit=50000, window_length=1) ----> 3 dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=10,target_model_update=1e-2, policy=policy) 4 dqn.compile(Adam(lr=1e-3), metrics=['mae']) 5

    1 frames /usr/local/lib/python3.6/dist-packages/rl/agents/dqn.py in init(self, model, policy, test_policy, enable_double_dqn, enable_dueling_network, dueling_type, *args, **kwargs) 106 107 # Validate (important) input. --> 108 if hasattr(model.output, 'len') and len(model.output) > 1: 109 raise ValueError('Model "{}" has more than one output. DQN expects a model that has a single output.'.format(model)) 110 if model.output._keras_shape != (None, self.nb_actions):

    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/keras_tensor.py in len(self) 238 239 def len(self): --> 240 raise TypeError('Keras symbolic inputs/outputs do not ' 241 'implement __len__. You may be ' 242 'trying to pass Keras symbolic inputs/outputs '

    TypeError: Keras symbolic inputs/outputs do not implement __len__. You may be trying to pass Keras symbolic inputs/outputs to a TF API that does not register dispatching, preventing Keras from automatically converting the API call to a lambda layer in the Functional Model. This error will also get raised if you try asserting a symbolic input/output directly.

    wontfix 
    opened by hemsatrakol 12
  • Value error when running DQN.fit

    Value error when running DQN.fit

    I tried teaching AI how to play breakout but my code crashes when I try to teach DQN model. `` import gym import numpy as np import tensorflow as tf from rl.agents.dqn import DQNAgent from rl.policy import LinearAnnealedPolicy, EpsGreedyQPolicy from rl.memory import SequentialMemory from keras.layers import Dense, Flatten, Convolution2D

    env = gym.make('ALE/Breakout-v5', render_mode='rgb_array') height, width, channels = env.observation_space.shape actions = env.action_space.n

    episodes = 10 for episode in range(1, episodes + 1): env.reset() done = False score = 0

    def buildModel(height, width, channels, actions): model = tf.keras.Sequential() model.add(Convolution2D(32, (8, 8), strides=(4, 4), activation='relu', input_shape=(3,height, width, channels))) model.add(Convolution2D(64, (4, 4), strides=(2, 2), activation='relu')) model.add(Convolution2D(64, (3, 3), activation='relu')) model.add(Flatten()) model.add(Dense(512, activation='relu')) model.add(Dense(256, activation='relu')) model.add(Dense(actions, activation='linear')) return model

    def buildAgent(model, actions): policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), attr='eps', value_max=1., value_min=.1, value_test=.2, nb_steps=10000) memory = SequentialMemory(limit=1000, window_length=3) dqn = DQNAgent(model=model, memory=memory, policy=policy, enable_dueling_network=True, dueling_type='avg', nb_actions=actions, nb_steps_warmup=1000) return dqn

    model = buildModel(height, width, channels, actions)

    DQN = buildAgent(model, actions) DQN.compile(tf.keras.optimizers.Adam(learning_rate=1e-4), metrics=['mae']) DQN.fit(env, nb_steps=1000000, visualize=True, verbose=1)

    scores = DQN.test(env, nb_episodes=1000, visualize=True) print(np.mean(scores.history['episode_reward'])) ``

    Error: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

    opened by GravermanDev 2
  • Multiple Actions in DQN (binary action vector)

    Multiple Actions in DQN (binary action vector)

    Hello,

    I have a requirement to have multiple actions as the output for one of the RL use case.

    Multi-Actions as output - Action Vector ( 1 1 1 1 0 1 1) possible values are 0 or 1 .

    I am wondering how to implement multiple actions as the output using DQN ? Is it possible ? If yes, Please provide as much detail and explanation as possible as I am just a beginner.

    Thank You

    opened by 2019hc04089 0
  • How to set the

    How to set the "nb_steps_warmup" and "nb_steps" properly?

    I got the following warning while using keras-rl to train a DQN agent. The batch_size argument is a small value i.e. 50.

    According to the warning information below, it seems that there are not enough experience to sample. I am not so sure about how to balance the arguments nb_steps_warmup and nb_steps.

    Moreover, does the nb_steps_warmup is a part of nb_steps, i.e. if nb_steps_warmup=10, nb_steps=50, there remain 50-10=40 steps used to train?

    /lib/python3.7/site-packages/rl/memory.py:37: UserWarning: Not enough entries to sample
     without replacement. Consider increasing your warm-up phase to avoid oversampling!
      warnings.warn('Not enough entries to sample without replacement. Consider increasing your warm-up phase to avoid overs
    ampling!')
    Traceback (most recent call last):
      File "main.py", line 124, in <module>
        history = agent.fit(env, nb_steps=args.nb_steps_train, visualize=False, verbose=args.verbose)
      File "/lib/python3.7/site-packages/rl/core.py", line 193, in fit
        metrics = self.backward(reward, terminal=done)
      File "/lib/python3.7/site-packages/rl/agents/dqn.py", line 250, in backward
        experiences = self.memory.sample(self.batch_size)
      File "/lib/python3.7/site-packages/rl/memory.py", line 263, in sample
        batch_idxs = sample_batch_indexes(0, self.nb_entries, size=batch_size)
      File "/lib/python3.7/site-packages/rl/memory.py", line 38, in sample_batch_indexes
        batch_idxs = np.random.random_integers(low, high - 1, size=size)
      File "mtrand.pyx", line 1328, in numpy.random.mtrand.RandomState.random_integers
      File "mtrand.pyx", line 746, in numpy.random.mtrand.RandomState.randint
      File "_bounded_integers.pyx", line 1254, in numpy.random._bounded_integers._rand_int64
    ValueError: low >= high
    
    opened by Beliefuture 1
  • Issue importing keras-rl on tensorflow-macos

    Issue importing keras-rl on tensorflow-macos

    The tensorflow-macos branch is currently archived so I was hoping someone could help me find a fix for this Import error. Im trying to import keras-rl packages and its throwing errors because TensorFlow does not have a function that keras uses during import. Idk if there's a previous version of keras-rl that would fix this or if I'm kinda just boned until TensorFlow has proper support for the m1 chip. logs below, package versions listed from pip also below. (run on a jupyter notebook and a minforge/conda venv)

    jupyter-client 7.0.2 jupyter-core 4.7.1 jupyterlab-pygments 0.1.2 tensorflow-addons-macos 0.1a3 tensorflow-estimator 2.5.0 tensorflow-macos 0.1a3 keras 2.6.0 keras-rl 0.4.2

    from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Flatten from tensorflow.keras.optimizers import Adam

    from rl.agents.dqn import DQNAgent from rl.policy import BoltzmannQPolicy from rl.memory import SequentialMemory

    ImportError Traceback (most recent call last) /var/folders/mt/kt4lp3rx2p37sfswrnyhlkjr0000gn/T/ipykernel_2558/438834739.py in ----> 1 from rl.agents.dqn import DQNAgent 2 from rl.policy import BoltzmannQPolicy 3 from rl.memory import SequentialMemory

    ~/miniforge3/envs/tensorflow/lib/python3.8/site-packages/rl/agents/init.py in 1 from future import absolute_import ----> 2 from .dqn import DQNAgent, NAFAgent, ContinuousDQNAgent 3 from .ddpg import DDPGAgent 4 from .cem import CEMAgent 5 from .sarsa import SarsaAgent, SARSAAgent

    ~/miniforge3/envs/tensorflow/lib/python3.8/site-packages/rl/agents/dqn.py in 2 import warnings 3 ----> 4 import keras.backend as K 5 from keras.models import Model 6 from keras.layers import Lambda, Input, Layer, Dense

    ~/miniforge3/envs/tensorflow/lib/python3.8/site-packages/keras/init.py in 23 24 # See b/110718070#comment18 for more details about this import. ---> 25 from keras import models 26 27 from keras.engine.input_layer import Input

    ~/miniforge3/envs/tensorflow/lib/python3.8/site-packages/keras/models.py in 17 18 import tensorflow.compat.v2 as tf ---> 19 from keras import backend 20 from keras import metrics as metrics_module 21 from keras import optimizer_v1

    ~/miniforge3/envs/tensorflow/lib/python3.8/site-packages/keras/backend.py in 34 from tensorflow.core.protobuf import config_pb2 35 from tensorflow.python.eager import context ---> 36 from tensorflow.python.eager.context import get_config 37 from tensorflow.python.framework import config 38 from keras import backend_config

    ImportError: cannot import name 'get_config' from 'tensorflow.python.eager.context' (/Users/sebastian/miniforge3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/python/eager/context.py)

    opened by Sebewe 1
Releases(v0.4.2)
  • v0.4.2(May 1, 2018)

  • V0.4.1(Apr 10, 2018)

  • v0.4.0(Dec 4, 2017)

    • Removes legacy code and now requires Keras >= 2.0.7 (https://github.com/matthiasplappert/keras-rl/pull/157)
    • Adds two new policies (https://github.com/matthiasplappert/keras-rl/pull/156, https://github.com/matthiasplappert/keras-rl/pull/122)
    • Fixes a few problems with replay memory sampling being incorrect for certain edge cases (https://github.com/matthiasplappert/keras-rl/pull/138)
    • Fixes a problem with dropout in actor model for DDPG (https://github.com/matthiasplappert/keras-rl/pull/150)
    Source code(tar.gz)
    Source code(zip)
  • v0.3.1(Sep 20, 2017)

  • v0.3.0(Mar 15, 2017)

    • Full compatibility with the recently released Keras 2
    • New SarsaAgent implementation
    • New train_policy on DQNAgent and SarsaAgent to allow for different policies at training and test time
    • Minor clean-ups
    Source code(tar.gz)
    Source code(zip)
  • v0.2.2(Feb 15, 2017)

  • v0.2.1(Feb 15, 2017)

  • v0.2.0rc1(Oct 17, 2016)

This is the official implementation of Multi-Agent PPO.

MAPPO Chao Yu*, Akash Velu*, Eugene Vinitsky, Yu Wang, Alexandre Bayen, and Yi Wu. Website: https://sites.google.com/view/mappo This repository implem

653 Jan 06, 2023
Retro Games in Gym

Status: Maintenance (expect bug fixes and minor updates) Gym Retro Gym Retro lets you turn classic video games into Gym environments for reinforcement

OpenAI 2.8k Jan 03, 2023
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grok

Google 10k Jan 07, 2023
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Stable Baselines Stable Baselines is a set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines. You can read a

Ashley Hill 3.7k Jan 01, 2023
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

MARL Tricks Our codes for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implemented and standardiz

404 Dec 25, 2022
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Status: Maintenance (expect bug fixes and minor updates) Baselines OpenAI Baselines is a set of high-quality implementations of reinforcement learning

OpenAI 13.5k Jan 07, 2023
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning. TF-Agents makes implementing, de

2.4k Dec 29, 2022
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

SLM Lab Modular Deep Reinforcement Learning framework in PyTorch. Documentation: https://slm-lab.gitbook.io/slm-lab/ BeamRider Breakout KungFuMaster M

Wah Loon Keng 1.1k Dec 24, 2022
Monitor your el-cheapo UPS via SNMP

UPSC-SNMP-Agent UPSC-SNMP-Agent exposes your el-cheapo locally connected UPS via the SNMP network management protocol. This enables various equipment

Tom Szilagyi 32 Jul 28, 2022
Tensorforce: a TensorFlow library for applied reinforcement learning

Tensorforce: a TensorFlow library for applied reinforcement learning Introduction Tensorforce is an open-source deep reinforcement learning framework,

Tensorforce 3.2k Jan 02, 2023
A toolkit for developing and comparing reinforcement learning algorithms.

Status: Maintenance (expect bug fixes and minor updates) OpenAI Gym OpenAI Gym is a toolkit for developing and comparing reinforcement learning algori

OpenAI 29.6k Jan 01, 2023
A toolkit for reproducible reinforcement learning research.

garage garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementa

Reinforcement Learning Working Group 1.6k Jan 09, 2023
ChainerRL is a deep reinforcement learning library built on top of Chainer.

ChainerRL ChainerRL is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using Ch

Chainer 1.1k Dec 26, 2022
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

Applied Reinforcement Learning @ Facebook Overview ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and

Facebook Research 3.3k Jan 05, 2023
A customisable 3D platform for agent-based AI research

DeepMind Lab is a 3D learning environment based on id Software's Quake III Arena via ioquake3 and other open source software. DeepMind Lab provides a

DeepMind 6.8k Jan 05, 2023
Open world survival environment for reinforcement learning

Crafter Open world survival environment for reinforcement learning. Highlights Crafter is a procedurally generated 2D world, where the agent finds foo

Danijar Hafner 213 Jan 05, 2023
Deep Reinforcement Learning for Keras.

Deep Reinforcement Learning for Keras What is it? keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seaml

Keras-RL 5.4k Jan 04, 2023
An open source robotics benchmark for meta- and multi-task reinforcement learning

Meta-World Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic

Reinforcement Learning Working Group 823 Jan 06, 2023
Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Coach Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms. It exposes a set of easy-to-us

Intel Labs 2.2k Jan 05, 2023
TensorFlow Reinforcement Learning

TRFL TRFL (pronounced "truffle") is a library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Le

DeepMind 3.1k Dec 29, 2022