PyTorch implementations of deep reinforcement learning algorithms and environments

Overview

Deep Reinforcement Learning Algorithms with PyTorch

Travis CI contributions welcome

RL PyTorch

This repository contains PyTorch implementations of deep reinforcement learning algorithms and environments.

(To help you remember things you learn about machine learning in general write them in Save All and try out the public deck there about Fast AI's machine learning textbook.)

Algorithms Implemented

  1. Deep Q Learning (DQN) (Mnih et al. 2013)
  2. DQN with Fixed Q Targets (Mnih et al. 2013)
  3. Double DQN (DDQN) (Hado van Hasselt et al. 2015)
  4. DDQN with Prioritised Experience Replay (Schaul et al. 2016)
  5. Dueling DDQN (Wang et al. 2016)
  6. REINFORCE (Williams et al. 1992)
  7. Deep Deterministic Policy Gradients (DDPG) (Lillicrap et al. 2016 )
  8. Twin Delayed Deep Deterministic Policy Gradients (TD3) (Fujimoto et al. 2018)
  9. Soft Actor-Critic (SAC) (Haarnoja et al. 2018)
  10. Soft Actor-Critic for Discrete Actions (SAC-Discrete) (Christodoulou 2019)
  11. Asynchronous Advantage Actor Critic (A3C) (Mnih et al. 2016)
  12. Syncrhonous Advantage Actor Critic (A2C)
  13. Proximal Policy Optimisation (PPO) (Schulman et al. 2017)
  14. DQN with Hindsight Experience Replay (DQN-HER) (Andrychowicz et al. 2018)
  15. DDPG with Hindsight Experience Replay (DDPG-HER) (Andrychowicz et al. 2018 )
  16. Hierarchical-DQN (h-DQN) (Kulkarni et al. 2016)
  17. Stochastic NNs for Hierarchical Reinforcement Learning (SNN-HRL) (Florensa et al. 2017)
  18. Diversity Is All You Need (DIAYN) (Eyensbach et al. 2018)

All implementations are able to quickly solve Cart Pole (discrete actions), Mountain Car Continuous (continuous actions), Bit Flipping (discrete actions with dynamic goals) or Fetch Reach (continuous actions with dynamic goals). I plan to add more hierarchical RL algorithms soon.

Environments Implemented

  1. Bit Flipping Game (as described in Andrychowicz et al. 2018)
  2. Four Rooms Game (as described in Sutton et al. 1998)
  3. Long Corridor Game (as described in Kulkarni et al. 2016)
  4. Ant-{Maze, Push, Fall} (as desribed in Nachum et al. 2018 and their accompanying code)

Results

1. Cart Pole and Mountain Car

Below shows various RL algorithms successfully learning discrete action game Cart Pole or continuous action game Mountain Car. The mean result from running the algorithms with 3 random seeds is shown with the shaded area representing plus and minus 1 standard deviation. Hyperparameters used can be found in files results/Cart_Pole.py and results/Mountain_Car.py.

Cart Pole and Mountain Car Results

2. Hindsight Experience Replay (HER) Experiements

Below shows the performance of DQN and DDPG with and without Hindsight Experience Replay (HER) in the Bit Flipping (14 bits) and Fetch Reach environments described in the papers Hindsight Experience Replay 2018 and Multi-Goal Reinforcement Learning 2018. The results replicate the results found in the papers and show how adding HER can allow an agent to solve problems that it otherwise would not be able to solve at all. Note that the same hyperparameters were used within each pair of agents and so the only difference between them was whether hindsight was used or not.

HER Experiment Results

3. Hierarchical Reinforcement Learning Experiments

The results on the left below show the performance of DQN and the algorithm hierarchical-DQN from Kulkarni et al. 2016 on the Long Corridor environment also explained in Kulkarni et al. 2016. The environment requires the agent to go to the end of a corridor before coming back in order to receive a larger reward. This delayed gratification and the aliasing of states makes it a somewhat impossible game for DQN to learn but if we introduce a meta-controller (as in h-DQN) which directs a lower-level controller how to behave we are able to make more progress. This aligns with the results found in the paper.

The results on the right show the performance of DDQN and algorithm Stochastic NNs for Hierarchical Reinforcement Learning (SNN-HRL) from Florensa et al. 2017. DDQN is used as the comparison because the implementation of SSN-HRL uses 2 DDQN algorithms within it. Note that the first 300 episodes of training for SNN-HRL were used for pre-training which is why there is no reward for those episodes.

Long Corridor and Four Rooms

Usage

The repository's high-level structure is:

├── agents                    
    ├── actor_critic_agents   
    ├── DQN_agents         
    ├── policy_gradient_agents
    └── stochastic_policy_search_agents 
├── environments   
├── results             
    └── data_and_graphs        
├── tests
├── utilities             
    └── data structures            

i) To watch the agents learn the above games

To watch all the different agents learn Cart Pole follow these steps:

git clone https://github.com/p-christ/Deep_RL_Implementations.git
cd Deep_RL_Implementations

conda create --name myenvname
y
conda activate myenvname

pip3 install -r requirements.txt

python results/Cart_Pole.py

For other games change the last line to one of the other files in the Results folder.

ii) To train the agents on another game

Most Open AI gym environments should work. All you would need to do is change the config.environment field (look at Results/Cart_Pole.py for an example of this).

You can also play with your own custom game if you create a separate class that inherits from gym.Env. See Environments/Four_Rooms_Environment.py for an example of a custom environment and then see the script Results/Four_Rooms.py to see how to have agents play the environment.

Owner
Petros Christodoulou
Petros Christodoulou
A simple, fast, and efficient object detector without FPN

You Only Look One-level Feature (YOLOF), CVPR2021 A simple, fast, and efficient object detector without FPN. This repo provides an implementation for

789 Jan 09, 2023
This is 2nd term discrete maths project done by UCU students that uses backtracking to solve various problems.

Backtracking Project Sponsors This is a project made by UCU students: Olha Liuba - crossword solver implementation Hanna Yershova - sudoku solver impl

Dasha 4 Oct 17, 2021
YouRefIt: Embodied Reference Understanding with Language and Gesture

YouRefIt: Embodied Reference Understanding with Language and Gesture YouRefIt: Embodied Reference Understanding with Language and Gesture by Yixin Che

16 Jul 11, 2022
Deploy pytorch classification model using Flask and Streamlit

Deploy pytorch classification model using Flask and Streamlit

Ben Seo 1 Nov 17, 2021
A fast and easy to use, moddable, Python based Minecraft server!

PyMine PyMine - The fastest, easiest to use, Python-based Minecraft Server! Features Note: This list is not always up to date, and doesn't contain all

PyMine 144 Dec 30, 2022
PyTorch implementations for our SIGGRAPH 2021 paper: Editable Free-viewpoint Video Using a Layered Neural Representation.

st-nerf We provide PyTorch implementations for our paper: Editable Free-viewpoint Video Using a Layered Neural Representation SIGGRAPH 2021 Jiakai Zha

Diplodocus 258 Jan 02, 2023
GAN-based 3D human pose estimation model for 3DV'17 paper

Tensorflow implementation for 3DV 2017 conference paper "Adversarially Parameterized Optimization for 3D Human Pose Estimation". @inproceedings{jack20

Dominic Jack 15 Feb 27, 2021
BackgroundRemover lets you Remove Background from images and video with a simple command line interface

BackgroundRemover BackgroundRemover is a command line tool to remove background from video and image, made by nadermx to power https://BackgroundRemov

Johnathan Nader 1.7k Dec 30, 2022
NLU Dataset Diagnostics

NLU Dataset Diagnostics This repository contains data and scripts to reproduce the results from our paper: Aarne Talman, Marianna Apidianaki, Stergios

Language Technology at the University of Helsinki 1 Jul 20, 2022
CasualHealthcare's Pneumonia detection with Artificial Intelligence (Convolutional Neural Network)

CasualHealthcare's Pneumonia detection with Artificial Intelligence (Convolutional Neural Network) This is PneumoniaDiagnose, an artificially intellig

Azhaan 2 Jan 03, 2022
In this project we predict the forest cover type using the cartographic variables in the training/test datasets.

Kaggle Competition: Forest Cover Type Prediction In this project we predict the forest cover type (the predominant kind of tree cover) using the carto

Marianne Joy Leano 1 Mar 15, 2022
Face Recognition and Emotion Detector Device

Face Recognition and Emotion Detector Device Orange PI 1 Python 3.10.0 + Django 3.2.9 Project's file explanation Django manage.py Django commands hand

BootyAss 2 Dec 21, 2021
This is the replication package for paper submission: Towards Training Reproducible Deep Learning Models.

This is the replication package for paper submission: Towards Training Reproducible Deep Learning Models.

0 Feb 02, 2022
Attention mechanism with MNIST dataset

[TensorFlow] Attention mechanism with MNIST dataset Usage $ python run.py Result Training Loss graph. Test Each figure shows input digit, attention ma

YeongHyeon Park 12 Jun 10, 2022
TeST: Temporal-Stable Thresholding for Semi-supervised Learning

TeST: Temporal-Stable Thresholding for Semi-supervised Learning TeST Illustration Semi-supervised learning (SSL) offers an effective method for large-

Xiong Weiyu 1 Jul 14, 2022
Music library streaming app written in Flask & VueJS

djtaytay This is a little toy app made to explore Vue, brush up on my Python, and make a remote music collection accessable through a web interface. I

Ryan Tasson 6 May 27, 2022
Liecasadi - liecasadi implements Lie groups operation written in CasADi

liecasadi liecasadi implements Lie groups operation written in CasADi, mainly di

Artificial and Mechanical Intelligence 14 Nov 05, 2022
It helps user to learn Pick-up lines and share if he has a better one

Pick-up-Lines-Generator(Open Source) It helps user to learn Pick-up lines Share and Add one or many to the DataBase Unique SQLite DataBase AI Undercon

knock_nott 0 May 04, 2022
General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usec

The Kompute Project 1k Jan 06, 2023
Metadata-Extractor - Metadata Extractor Script can be used to read in exif metadata

Metadata Extractor The exifextract script can be used to read in exif metadata f

1 Feb 16, 2022