Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods

Overview

Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods

This repository is the official implementation of

  • Seohong Park, Jaekyeom Kim, Gunhee Kim. Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods. In NeurIPS, 2021.

It contains the implementations for SAR, FiGAR-C and base policy gradient algorithms (PPO, TRPO and A2C).

The code is based on Stable Baselines3 (SB3) for PPO and A2C, and Stable Baselines (SB) for TRPO.

Requirements

Run examples

PPO and A2C (based on Stable Baselines3)

To install requirements:

cd sb3
pip install -r requirements.txt
pip install -e .

Train SAR-PPO on InvertedPendulum-v2 with δ = 0.01:

python repeat/main.py --env=InvertedPendulum-v2 --algo=ppo_rg --frame_skip=1 --dt=0.01 --max_t=0.05 --max_d=0.5

Train FiGAR-C-PPO on InvertedPendulum-v2 with δ = 0.01:

python repeat/main.py --env=InvertedPendulum-v2 --algo=ppo_rg --frame_skip=1 --dt=0.01 --max_t=0.05

Train PPO on InvertedPendulum-v2 with the original δ:

python repeat/main.py --env=InvertedPendulum-v2 --algo=ppo_rg

Train SAR-A2C on InvertedPendulum-v2 with δ = 0.01:

python repeat/main.py --env=InvertedPendulum-v2 --algo=a2c_rg --frame_skip=1 --dt=0.01 --max_t=0.05 --max_d=0.5

Train SAR-PPO on InvertedPendulum-v2 with δ = 0.002 and the "Action Noise" setting:

python repeat/main.py --env=InvertedPendulum-v2 --algo=ppo_rg --frame_skip=1 --dt=0.002 --max_t=0.05 --max_d=0.5 --anoise_type=action --anoise_prob=0.05 --anoise_std=3

Train SAR-PPO on InvertedPendulum-v2 with δ = 0.002 and the "External Force" setting:

python repeat/main.py --env=InvertedPendulum-v2 --algo=ppo_rg --frame_skip=1 --dt=0.002 --max_t=0.05 --max_d=0.5 --anoise_type=ext_f --anoise_prob=0.05 --anoise_std=300

Train SAR-PPO on InvertedPendulum-v2 with δ = 0.002 and the "Strong External Force (Perceptible)" setting:

python repeat/main.py --env=InvertedPendulum-v2 --algo=ppo_rg --frame_skip=1 --dt=0.002 --max_t=0.05 --max_d=0.5 --anoise_type=ext_fpc --anoise_prob=0.05 --anoise_std=1000

TRPO (based on Stable Baselines)

To install requirements:

cd sb
pip install -r requirements.txt
pip install -e .

Train SAR-TRPO on InvertedPendulum-v2 with δ = 0.01:

python repeat/main.py --env=InvertedPendulum-v2 --frame_skip=1 --dt=0.01 --max_t=0.05 --max_d=0.5

License

This codebase is licensed under the MIT License. See also sb3/LICENSE_SB3 and sb/LICENSE_SB.

Owner
Seohong Park
Seohong Park
Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods

Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods This repository is the official implementation of Seohong Park, Jaeky

Seohong Park 6 Aug 02, 2022
Rouge Spammers with a mission to disrupt the peace of the valley ? Fear not we will STOMP the Spammers

Rouge Spammers with a mission to disrupt the peace of the valley ? Fear not we will STOMP the Spammers New Update : adding 'on-review' tag on an issue

A N U S H 13 Sep 19, 2021
This is a Cryptographied Password Manager, a tool for storing Passwords in a Secure way

Cryptographied Password Manager This is a Cryptographied Password Manager, a tool for storing Passwords in a Secure way without using external Service

Francesco 3 Nov 23, 2022
A passive-recon tool that parses through found assets and interacts with the Hackerone API

Hackerone Passive Recon Tool A passive-recon tool that parses through found assets and interacts with the Hackerone API. Setup Simply run setup.sh to

elbee 4 Jan 13, 2022
This repository contains wordlists for each versions of common web applications and content management systems (CMS). Each version contains a wordlist of all the files directories for this version.

webapp-wordlists This repository contains wordlists for each versions of common web applications and content management systems (CMS). Each version co

Podalirius 396 Jan 08, 2023
This project is for finding a solution to use Security Onion Elastic data with Jupyter Notebooks.

This project is for finding a solution to use Security Onion Elastic data with Jupyter Notebooks. The goal is to successfully use this notebook project below with Security Onion for beacon detection

4 Jun 08, 2022
Directory Traversal in Afterlogic webmail aurora and pro

CVE-2021-26294 Exploit Directory Traversal in Afterlogic webmail aurora and pro . Description: AfterLogic Aurora and WebMail Pro products with 7.7.9 a

Ashish Kunwar 8 Nov 09, 2022
带回显版本的漏洞利用脚本

CVE-2021-21978 带回显版本的漏洞利用脚本,更简单的方式 0. 漏洞信息 VMware View Planner Web管理界面存在一个上传日志功能文件的入口,没有进行认证且写入的日志文件路径用户可控,通过覆盖上传日志功能文件log_upload_wsgi.py,即可实现RCE 漏洞代码

3ky7in4 24 Nov 09, 2022
Log4Shell RCE Exploit - fully independent exploit does not require any 3rd party binaries.

Log4Shell RCE Exploit fully independent exploit does not require any 3rd party binaries. The exploit spraying the payload to all possible logged HTTP

258 Jan 02, 2023
NEW FACEBOOK CLONER WITH NEW PASSWORD, TERMUX FB CLONE, FB CLONING COMMAND. M

NEW FACEBOOK CLONER WITH NEW PASSWORD, TERMUX FB CLONE, FB CLONING COMMAND. M

Mr. Error 81 Jan 08, 2023
Rapidly enumerate subdomains and domains using rapiddns.io.

Description Simple python module (unofficial) allowing you to access data from rapiddns.io. You can also use it as a module. As mentioned on the rapid

27 Dec 31, 2022
Python Library For Ethical Hacker

Python Library For Ethical Hacker

11 Nov 03, 2022
Gmail Accounts Hacking

gmail-hack Gmail Accounts Hacking Gemail-Hack python script for Hack gmail account brute force What is brute force attack? In brute force attack,scrip

Aryan 25 Nov 10, 2022
Description Basic Recon tool for beginners. Especially those who faces issue on how to recon or what all tools to use

Description Basic Recon tool for beginners. Especially those who faces issue on how to recon or what all tools to use. Will try to add atleast 10 more tools currently use 7 sources to gather domains.

Harinder Singh 7 Jan 03, 2022
Quickstart resources for the WiFi Nugget, a cat themed WiFi Security platform for beginners.

Quickstart resources for the WiFi Nugget, a cat themed WiFi Security platform for beginners.

HakCat 62 Jan 08, 2023
This Repository is an up-to-date version of Harvard nlp's Legacy code and a Refactoring of the jupyter notebook version as a shell script version.

This Repository is an up-to-date version of Harvard nlp's Legacy code and a Refactoring of the jupyter notebook version as a shell script version.

신재욱 17 Sep 25, 2022
Flutter Reverse Engineering Framework

This framework helps reverse engineer Flutter apps using patched version of Flutter library which is already compiled and ready for app repacking. There are changes made to snapshot deserialization p

PT SWARM 910 Jan 01, 2023
Having a weak password is not good for a system that demands high confidentiality and security of user credentials

Having a weak password is not good for a system that demands high confidentiality and security of user credentials. It turns out that people find it difficult to make up a strong password that is str

PyLaboratory 0 Feb 07, 2022
This repository is one of a few malware collections on the GitHub.

This repository is one of a few malware collections on the GitHub.

Andrew 1.7k Dec 28, 2022
This is a Crypto asset tracker that I built to aid my personal journey in cryptocurrencies.

Wallet Tracker This is a Crypto asset tracker that I built to aid my personal journey in cryptocurrencies. build docker build -t wallet-tracker . run

2 Mar 21, 2022