A collection of various RL algorithms like policy gradients, DQN and PPO. The goal of this repo will be to make it a go-to resource for learning about RL. How to visualize, debug and solve RL problems. I've additionally included playground.py for learning more about OpenAI gym, etc.

Overview

Reinforcement Learning (PyTorch) 🤖 + 🍰 = ❤️

This repo will contain PyTorch implementation of various fundamental RL algorithms.
It's aimed at making it easy to start playing and learning about RL.

The problem I came across investigating other DQN projects is that they either:

  • Don't have any evidence that they've actually achieved the published results
  • Don't have a "smart" replay buffer (i.e. they allocate (1M, 4, 84, 84) ~ 28 GBs! instead of (1M, 84, 84) ~ 7 GB)
  • Lack of visualizations and debugging utils

This repo will aim to solve these problems.

Table of Contents

RL agents

DQN

This was the project that started the revolution in the RL world - deep Q-network ( 🔗 Mnih et al.),
aka "Human-level control through deep RL".

DQN model learned to play 29 Atari games (out of 49 they it tested on) on a super-human/comparable-to-humans level. Here is the schematic of it's CNN architecture:

The fascinating part is that it learned only from "high-dimensional" (84x84) images and (usually sparse) rewards. The same architecture was used for all of the 49 games - although the model has to be retrained, from scratch, every single time.

DQN current results

Since it takes lots of compute and time to train all of the 49 models I'll consider the DQN project completed once I succeed in achieving the published results on:

  • Breakout
  • Pong

Having said that the experiments are still in progress, so feel free to contribute!

  • For some reason the models aren't learning very well so if you find a bug open up a PR! ❤️
  • I'm also experiencing slowdowns - so any PRs that would improve/explain the perf are welcome!
  • If you decide to train the DQN using this repo on some other Atari game I'll gladly check-in your model!

Important note: please follow the coding guidelines of this repo before you submit a PR so that we can minimize the back-and-forth. I'm a decently busy guy as I assume you are.

Current results - Breakout

As you can see the model did learn something although it's far from being really good.

Current results - Pong

todo

Setup

Let's get this thing running! Follow the next steps:

  1. git clone https://github.com/gordicaleksa/pytorch-learn-reinforcement-learning
  2. Open Anaconda console and navigate into project directory cd path_to_repo
  3. Run conda env create from project directory (this will create a brand new conda environment).
  4. Run activate pytorch-rl-env (for running scripts from your console or setup the interpreter in your IDE)

If you're on Windows you'll additionally need to install this: pip install https://github.com/Kojoley/atari-py/releases atary_py to install gym's Atari dependencies.

Otherwise this should do it pip install 'gym[atari]', if it's not working check out this and this.

That's it! It should work out-of-the-box executing environment.yml file which deals with dependencies.


PyTorch pip package will come bundled with some version of CUDA/cuDNN with it, but it is highly recommended that you install a system-wide CUDA beforehand, mostly because of the GPU drivers. I also recommend using Miniconda installer as a way to get conda on your system. Follow through points 1 and 2 of this setup and use the most up-to-date versions of Miniconda and CUDA/cuDNN for your system.

Usage

Option 1: Jupyter Notebook

Coming soon.

Option 2: Use your IDE of choice

You just need to link the Python environment you created in the setup section.

Training DQN

To run with default settings just run python train_DQN_script.py.

Settings you'll want to experiment with:

  • --seed - it may just so happen that I've chosen a bad one (RL is very sensitive)
  • --learning_rate - DQN originally used RMSProp, I saw that Adam with 1e-4 worked for stable baselines 3
  • --grad_clipping_value - there was a lot of noise in the gradients so I used this to control it
  • Try using RMSProp (I haven't yet). Adam was an improvement over RMSProp so I doubt it's causing the issues

Less important settings for getting DQN to work:

  • --env_id - depending on which game you want to train on (I'd focus on the easiest one for now - Breakout)
  • --replay_buffer_size - hopefully you can train DQN with 1M, as in the original paper, if not make it smaller
  • --dont_crash_if_no_mem - add this flag if you want to run with 1M replay buffer even if you don't have enough RAM

The training script will:

  • Dump checkpoint *.pth models into models/checkpoints/
  • Dump the best (highest reward) *.pth model into models/binaries/ <- TODO
  • Periodically write some training metadata to the console
  • Save tensorboard metrics into runs/, to use it check out the visualization section

Visualization and debugging tools

You can visualize the metrics during the training, by calling tensorboard --logdir=runs from your console and pasting the http://localhost:6006/ URL into your browser.

I'm currently visualizing the Huber loss (and you can see there is something weird going on):

Rewards and steps taken per episode (there is a fair bit of correlation between these 2):

And gradient L2 norms of weights and biases of every CNN/FC layer as well as the complete grad vector:

As well as epsilon (from the epsilon-greedy algorithm) but that plot is not that informative so I'll omit it here.

As you can see the plots are super noisy! As I could have expected, but the progress just stagnates from certain point onwards and that's what I'm trying to debug atm.


To enter the debug mode add the --debug flag to your console or IDE's list of script arguments.

It'll visualize the current state that's being fed into the RL agent. Sometimes the state will have some black frames prepended since there aren't enough frames experienced in the current episode:

But mostly all of the 4 frames will be in there:

And it will start rendering the game frames (Pong and Breakout showed here from left to right):

Hardware requirements

You'll need some decent hardware to train the DQN in reasonable time so that you can iterate fast:

  1. 16+ GB of RAM (Replay Buffer takes around ~7 GBs of RAM).
  2. The faster your GPU is - the better! 😅 Having said that VRAM is not the bottleneck you'll need 2+ GB VRAM.

With 16 GB RAM and RTX 2080 it takes ~5 days to train DQN on my machine - I'm experiencing some slowdowns which I haven't debugged yet. Here is the FPS (frames-per-second) metric I'm logging:

The shorter, green one is the current experiment I'm running, the red one took over 5 days to train.

Future todos

  1. Debug DQN and achieve the published results
  2. Add Vanilla PG
  3. Add PPO

Learning material

Here are some videos I made on RL which may help you to better understand how DQN and other RL algorithms work:

DQN paper explained

And some other ones:

And in this one I tried to film through the process while the project was not nearly as polished as it is now:

I'll soon create a blog on how to get started with RL - so stay tuned for that!

Acknowledgements

I found these resources useful while developing this project, sorted (approximately) by usefulness:

Citation

If you find this code useful, please cite the following:

@misc{Gordić2021PyTorchLearnReinforcementLearning,
  author = {Gordić, Aleksa},
  title = {pytorch-learn-reinforcement-learning},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/gordicaleksa/pytorch-learn-reinforcement-learning}},
}

Licence

License: MIT

Owner
Aleksa Gordic
Neural network whisperer working in the area of human understanding and mixed/virtual reality devices like Microsoft HoloLens.
Aleksa Gordic
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

MARL Tricks Our codes for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implemented and standardiz

404 Dec 25, 2022
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grok

Google 10k Jan 07, 2023
An open source robotics benchmark for meta- and multi-task reinforcement learning

Meta-World Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic

Reinforcement Learning Working Group 823 Jan 06, 2023
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

Applied Reinforcement Learning @ Facebook Overview ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and

Facebook Research 3.3k Jan 05, 2023
TensorFlow Reinforcement Learning

TRFL TRFL (pronounced "truffle") is a library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Le

DeepMind 3.1k Dec 29, 2022
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Stable Baselines Stable Baselines is a set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines. You can read a

Ashley Hill 3.7k Jan 01, 2023
A customisable 3D platform for agent-based AI research

DeepMind Lab is a 3D learning environment based on id Software's Quake III Arena via ioquake3 and other open source software. DeepMind Lab provides a

DeepMind 6.8k Jan 05, 2023
Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. :godmode:

ViZDoom ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research

Marek Wydmuch 1.5k Dec 30, 2022
A toolkit for developing and comparing reinforcement learning algorithms.

Status: Maintenance (expect bug fixes and minor updates) OpenAI Gym OpenAI Gym is a toolkit for developing and comparing reinforcement learning algori

OpenAI 29.6k Jan 01, 2023
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.

Paddle-RLBooks Welcome to Paddle-RLBooks which is a reinforcement learning code study guide based on pure PaddlePaddle. 欢迎来到Paddle-RLBooks,该仓库主要是针对强化学

AgentMaker 117 Dec 12, 2022
Open world survival environment for reinforcement learning

Crafter Open world survival environment for reinforcement learning. Highlights Crafter is a procedurally generated 2D world, where the agent finds foo

Danijar Hafner 213 Jan 05, 2023
ChainerRL is a deep reinforcement learning library built on top of Chainer.

ChainerRL ChainerRL is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using Ch

Chainer 1.1k Dec 26, 2022
Tensorforce: a TensorFlow library for applied reinforcement learning

Tensorforce: a TensorFlow library for applied reinforcement learning Introduction Tensorforce is an open-source deep reinforcement learning framework,

Tensorforce 3.2k Jan 02, 2023
Monitor your el-cheapo UPS via SNMP

UPSC-SNMP-Agent UPSC-SNMP-Agent exposes your el-cheapo locally connected UPS via the SNMP network management protocol. This enables various equipment

Tom Szilagyi 32 Jul 28, 2022
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

SLM Lab Modular Deep Reinforcement Learning framework in PyTorch. Documentation: https://slm-lab.gitbook.io/slm-lab/ BeamRider Breakout KungFuMaster M

Wah Loon Keng 1.1k Dec 24, 2022
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning. TF-Agents makes implementing, de

2.4k Dec 29, 2022
A toolkit for reproducible reinforcement learning research.

garage garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementa

Reinforcement Learning Working Group 1.6k Jan 09, 2023
Retro Games in Gym

Status: Maintenance (expect bug fixes and minor updates) Gym Retro Gym Retro lets you turn classic video games into Gym environments for reinforcement

OpenAI 2.8k Jan 03, 2023
This is the official implementation of Multi-Agent PPO.

MAPPO Chao Yu*, Akash Velu*, Eugene Vinitsky, Yu Wang, Alexandre Bayen, and Yi Wu. Website: https://sites.google.com/view/mappo This repository implem

653 Jan 06, 2023