当前位置:网站首页>stable_ Baselines quick start
stable_ Baselines quick start
2022-07-25 13:20:00 【Hurry up and don't let it rot】
0 brief introduction
baselines yes OpenAI Launched a set of reinforcement learning algorithm components , Used to quickly configure reinforcement learning algorithm , It is friendly to beginners
1 install
pip install stable-baselines
2 Parameter Introduction
Base RL Class
common interface for all the RL algorithms
class stable_baselines.common.base_class.BaseRLModel(policy,env,verbos=0,*,requires_vec_env,policy_base,policy_kwargs=None,seed=None,n_cpu_tf_sess=None)
The base RL model
Parameters:
policy - ( BasePolicy )Policy object
policy: Strategy model selection , Used to establish status / state - The connection between action pairs and strategies , The bottom layer is multilayer perceptron or convolutional network .
env: [Gym environment] The environment to learn from [if registered in Gym, can be str. Can be None for loading trained models]
env:
The necessary methods :step(action)、reset()、render()
Necessary elements :action_space、observation_space
step(action): Simulation step , How to accept a action Then carry out one-step simulation
reset(): Reset
render(): Show
action_space: Successive / discrete . For example, discrete , East, West, North and south . Like continuous , Select an interval to produce a number , As one of his motion steps .
observation_space It's the same , For example, robot right 6 A joint , His state is based on his 6 Positions and 6 A speed to express . This speed and position have an upper and lower range . This range can be used as his observation_space
Meet the above 5 If there are three elements , This environment can be transmitted to stable-baselines It's time for the next training
application
adopt stable_baselines establish DQN frame , Train and run the inverted pendulum (CartPole-v0)
from stable_baselines import DQN
from stable_baselines.common.evaluation import evaluate_policy
import gym
import time
env = gym.make('CartPole-v0') # Incoming inverted pendulum
TRAIN = 0
if TRAIN: # The training part
model = DQN('MlpPolicy', env, learning_rate=1e-3, prioritized_replay=True, verbose=1) # It belongs to such a network that can be discrete
# MlpPolicy, A strategy of multilayer perceptron or neural network
# env, Incoming environment
# Some other parameters , Look under this folder , I don't want to elaborate , Each one has a detailed explanation
model.learn(total_timesteps=int(1e5)) # Start training , Direct use model.learn That's all right. , This learn Some parameters will also be involved in
model.save("dqn_cartpole") # After training , You can save such a model
del model # After training , This model is useless , You can delete it
else: # Part of the presentation
model = DQN.load("dqn_cartpole", env) # Call the trained model , Call from neural network
mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10)
obs = env.reset() # Status reset
for i in range(1000):
action, _states = model.predict(obs) # Pass the current status into , Tell us what actions we will make .
obs, rewards, done, info = env.step(action)# Return to a new state
env.render() # Make a display
time.sleep(2) # for showing render()
边栏推荐
- Masscode is an excellent open source code fragment manager
- 0717RHCSA
- 【服务器数据恢复】HP EVA服务器存储RAID信息断电丢失的数据恢复
- Emqx cloud update: more parameters are added to log analysis, which makes monitoring, operation and maintenance easier
- Detailed explanation of the training and prediction process of deep learning [taking lenet model and cifar10 data set as examples]
- 0710RHCSA
- 0715RHCSA
- Redis可视化工具RDM安装包分享
- Based on Baiwen imx6ull_ Pro development board transplants LCD multi touch driver (gt911)
- [Video] Markov chain Monte Carlo method MCMC principle and R language implementation | data sharing
猜你喜欢

Based on Baiwen imx6ull_ Pro development board transplants LCD multi touch driver (gt911)

Programmer growth chapter 27: how to evaluate requirements priorities?

Concurrent programming - memory model JMM
![[CSDN year-end summary] end and start, always on the way -](/img/51/a3fc5eba0eeb22b600260ee81ff9e6.png)
[CSDN year-end summary] end and start, always on the way - "2021 summary of" 1+1= Wang "

Atcoder beginer contest 261 f / / tree array

B tree and b+ tree

Any time, any place, super detective, seriously handle the case!

Shell common script: judge whether the file of the remote host exists

Convolutional neural network model -- vgg-16 network structure and code implementation

The migration of arm architecture to alsa lib and alsa utils is smooth
随机推荐
机器学习强基计划0-4:通俗理解奥卡姆剃刀与没有免费午餐定理
说说对hashcode和equals方法的理解?
Migrate PaloAlto ha high availability firewall to panorama
IM系统-消息流化一些常见问题
Business visualization - make your flowchart'run'(3. Branch selection & cross language distributed operation node)
录制和剪辑视频,如何解决占用空间过大的问题?
Pytorch creates its own dataset and loads the dataset
Atcoder beginer contest 261e / / bitwise thinking + DP
cv2.resize函数报错:error: (-215:Assertion failed) func != 0 in function ‘cv::hal::resize‘
【CSDN 年终总结】结束与开始,一直在路上—— “1+1=王”的2021总结
The programmer's father made his own AI breast feeding detector to predict that the baby is hungry and not let the crying affect his wife's sleep
0717RHCSA
外围系统调用SAP的WebAPI接口
[Video] Markov chain Monte Carlo method MCMC principle and R language implementation | data sharing
为提高效率使用ParallelStream竟出现各种问题
Convolutional neural network model -- alexnet network structure and code implementation
How to understand metrics in keras
Convolutional neural network model -- lenet network structure and code implementation
arm架构移植alsa-lib和alsa-utils一路畅通
面试官问我:Mysql的存储引擎你了解多少?