当前位置:网站首页>stable_ Baselines quick start
stable_ Baselines quick start
2022-07-25 13:20:00 【Hurry up and don't let it rot】
0 brief introduction
baselines yes OpenAI Launched a set of reinforcement learning algorithm components , Used to quickly configure reinforcement learning algorithm , It is friendly to beginners
1 install
pip install stable-baselines
2 Parameter Introduction
Base RL Class
common interface for all the RL algorithms
class stable_baselines.common.base_class.BaseRLModel(policy,env,verbos=0,*,requires_vec_env,policy_base,policy_kwargs=None,seed=None,n_cpu_tf_sess=None)
The base RL model
Parameters:
policy - ( BasePolicy )Policy object
policy: Strategy model selection , Used to establish status / state - The connection between action pairs and strategies , The bottom layer is multilayer perceptron or convolutional network .
env: [Gym environment] The environment to learn from [if registered in Gym, can be str. Can be None for loading trained models]
env:
The necessary methods :step(action)、reset()、render()
Necessary elements :action_space、observation_space
step(action): Simulation step , How to accept a action Then carry out one-step simulation
reset(): Reset
render(): Show
action_space: Successive / discrete . For example, discrete , East, West, North and south . Like continuous , Select an interval to produce a number , As one of his motion steps .
observation_space It's the same , For example, robot right 6 A joint , His state is based on his 6 Positions and 6 A speed to express . This speed and position have an upper and lower range . This range can be used as his observation_space
Meet the above 5 If there are three elements , This environment can be transmitted to stable-baselines It's time for the next training
application
adopt stable_baselines establish DQN frame , Train and run the inverted pendulum (CartPole-v0)
from stable_baselines import DQN
from stable_baselines.common.evaluation import evaluate_policy
import gym
import time
env = gym.make('CartPole-v0') # Incoming inverted pendulum
TRAIN = 0
if TRAIN: # The training part
model = DQN('MlpPolicy', env, learning_rate=1e-3, prioritized_replay=True, verbose=1) # It belongs to such a network that can be discrete
# MlpPolicy, A strategy of multilayer perceptron or neural network
# env, Incoming environment
# Some other parameters , Look under this folder , I don't want to elaborate , Each one has a detailed explanation
model.learn(total_timesteps=int(1e5)) # Start training , Direct use model.learn That's all right. , This learn Some parameters will also be involved in
model.save("dqn_cartpole") # After training , You can save such a model
del model # After training , This model is useless , You can delete it
else: # Part of the presentation
model = DQN.load("dqn_cartpole", env) # Call the trained model , Call from neural network
mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10)
obs = env.reset() # Status reset
for i in range(1000):
action, _states = model.predict(obs) # Pass the current status into , Tell us what actions we will make .
obs, rewards, done, info = env.step(action)# Return to a new state
env.render() # Make a display
time.sleep(2) # for showing render()
边栏推荐
- 网络空间安全 渗透攻防9(PKI)
- 【GCN】《Adaptive Propagation Graph Convolutional Network》(TNNLS 2020)
- Prepare for 2022 csp-j1 2022 csp-s1 preliminaries video set
- 0719RHCSA
- 说说对hashcode和equals方法的理解?
- 6W+字记录实验全过程 | 探索Alluxio经济化数据存储策略
- 领域驱动模型设计与微服务架构落地-模型设计
- Atcoder beginer contest 261 f / / tree array
- Generate SQL script file by initializing the latest warehousing time of vehicle attributes
- Django 2 ----- 数据库与Admin
猜你喜欢

Design and principle of thread pool

Concurrent programming - memory model JMM

Docker学习 - Redis集群-3主3从-扩容-缩容搭建

【GCN】《Adaptive Propagation Graph Convolutional Network》(TNNLS 2020)

0716RHCSA
![[Video] Markov chain Monte Carlo method MCMC principle and R language implementation | data sharing](/img/20/bb43ab1bc447b519c3b1de0f809b31.png)
[Video] Markov chain Monte Carlo method MCMC principle and R language implementation | data sharing

Excel添加按键运行宏

并发编程 — 内存模型 JMM

Docekr学习 - MySQL8主从复制搭建部署

Convolutional neural network model -- vgg-16 network structure and code implementation
随机推荐
网络空间安全 渗透攻防9(PKI)
Django 2 ----- 数据库与Admin
Based on Baiwen imx6ull_ Pro development board transplants LCD multi touch driver (gt911)
[ai4code final chapter] alphacode: competition level code generation with alphacode (deepmind)
【视频】马尔可夫链原理可视化解释与R语言区制转换MRS实例|数据分享
Online Learning and Pricing with Reusable Resources: Linear Bandits with Sub-Exponential Rewards: Li
0713RHCSA
Prepare for 2022 csp-j1 2022 csp-s1 preliminaries video set
Masscode is an excellent open source code fragment manager
Atcoder beginer contest 261 f / / tree array
Docekr learning - MySQL 8 master-slave replication setup deployment
Shell common script: get the IP address of the network card
The whole process of 6w+ word recording experiment | explore the economical data storage strategy of alluxio
0720RHCSA
Online Learning and Pricing with Reusable Resources: Linear Bandits with Sub-Exponential Rewards: Li
Any time, any place, super detective, seriously handle the case!
0720RHCSA
[Video] Markov chain Monte Carlo method MCMC principle and R language implementation | data sharing
Machine learning strong foundation program 0-4: popular understanding of Occam razor and no free lunch theorem
Excel录制宏