当前位置：网站首页>HIRO: Hierarchical Reinforcement Learning 】【 Data - Efficient Hierarchical Reinforcement Learning

HIRO: Hierarchical Reinforcement Learning 】【 Data - Efficient Hierarchical Reinforcement Learning

2022-08-01 01:36:00 【little handsome acridine】

Paper Title: Data-Efficient Hierarchical Reinforcement Learning
Authors: Ofir Nachum, Shixiang (Shane) Gu, Honglak Lee, Sergey Levine
Published by: NeurIPS 2018

Summary

Hierarchical reinforcement learning (HRL) is a promising approach to extending traditional reinforcement learning (RL) methods to solve more complex tasks.Most current HRL methods require careful task-specific design and policy training, which makes them difficult to apply to real-world scenarios.In this paper, we examine how to develop general HRL algorithms because they do not make onerous additional assumptions beyond standard RL algorithms and are efficient because they can be used with a modest number of interaction samples, making themIt is suitable for real-world problems such as robot control.For generality, we develop a scheme in which the low-level controller is automatically learned and supervised by the proposed objective by the high-level controller.To improve efficiency, we recommend using off-policy experience in both high-level and low-level training.This poses a considerable challenge, as changes in lower-level behavior can alter the action space for higher-level policies, and we introduce a type of Off-Policy Corrections to address this challenge.This allows us to take advantage of recent advances in off-policy
model-free RL to learn higher-level and lower-level policies using much less environment interaction than on-policy algorithms.We called the generated HRL proxy HIRO and found it to be generally applicable and sample efficient.Our experiments show that HIRO can be used to learn to simulate highly complex behaviors of robots, such as pushing objects and using them to reach target locations, learning from millions of samples, equivalent to days of real-time interactions.Compared with many previous HRL methods, we find that our method outperforms the previous state-of-the-art by a large margin.

Algorithmic Framework

insert image description here

Internal Rewards

where the high-level policy produces a target gt indicating the desired relative change in state observations.That is, at step t, the high-level policy produces a goal gt, indicating that it expects the lower-level agent to take action, producing an observation st+c that is close to st+gt.Although some state dimensions are more natural as target subspaces, we choose this more general goal representation to have broad applicability without the need to manually design the target space, primitives, or controllable dimensions. This makes ourThe method is general and applicable to new problem settings.
In order to maintain the same absolute position of the target regardless of the state change, the target transition model h is defined as

insert image description here
Define intrinsic reward as based on current observation and target observationA parameterized reward function for the distance between:

aboveThis reward function targets low-level policies for taking actions that yield observations close to the expected value st + gt.

Lower-level policies can be trained using standard methods by simply incorporating gt as an additional input into the value and policy models.For example, in DDPG, low-level criticism is achieved by minimizing the error of the following equation:
Insert image description here
The policy actor is updated with:
hereInsert image description
Off-Policy Corrections for Higher-Level Training

What is the non-stationarity problem?

insert image description here

Here my understanding is that by modifying the target in the transition, when encountering the same state, the underlying strategy will eventuallyThe resulting result passed to the upper layer is consistent with the result passed to the upper layer when this state was encountered before.

Reference:
https://zhuanlan.zhihu.com/p/86602304
https://zhuanlan.zhihu.com/p/497424021

HIRO's pytorch code: https://github.com/watakandai/hiro_pytorch

原网站

版权声明
本文为[little handsome acridine]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/213/202208010126404906.html

当前位置：网站首页>HIRO: Hierarchical Reinforcement Learning 】【 Data - Efficient Hierarchical Reinforcement Learning

HIRO: Hierarchical Reinforcement Learning 】【 Data - Efficient Hierarchical Reinforcement Learning

边栏推荐

猜你喜欢

随机推荐

当前位置：网站首页>HIRO: Hierarchical Reinforcement Learning 】 【 Data - Efficient Hierarchical Reinforcement Learning

HIRO: Hierarchical Reinforcement Learning 】 【 Data - Efficient Hierarchical Reinforcement Learning

边栏推荐

猜你喜欢

随机推荐

当前位置：网站首页>HIRO: Hierarchical Reinforcement Learning 】【 Data - Efficient Hierarchical Reinforcement Learning

HIRO: Hierarchical Reinforcement Learning 】【 Data - Efficient Hierarchical Reinforcement Learning