当前位置：网站首页>Kaust:deyao Zhu | value memory map: a graph structured world model based on off-line reinforcement learning

Kaust:deyao Zhu | value memory map: a graph structured world model based on off-line reinforcement learning

2022-06-12 23:52:00 【Zhiyuan community】

【 title 】Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning

【 The author team 】Deyao Zhu, Li Erran Li, Mohamed Elhoseiny

【 Date of publication 】2022.6.9

【 Thesis link 】https://arxiv.org/pdf/2206.04384.pdf

【 Recommended reasons 】 World models in model-based reinforcement learning usually face unrealistic long-term prediction problems , The composite error caused by the accumulation of prediction error with time step . Recent research in the graph structured world model improves the long-term reasoning ability by constructing a graph to represent the environment , But they are designed under target conditions , Cannot be guided in a traditional reinforcement learning environment without an external given target state agent Maximize plot returns . So , In this paper, we construct a directed graph based Markov decision process （MDP） To design a graph structured world model for off-line reinforcement learning , The reward is assigned to each directed edge as an abstraction of the original continuous environment . Because compared with the original environment , The world model has small and finite states / Action space , Therefore, it is easy to use value iteration to estimate the state value on the graph and find the best future . This world model is called a value memory map (VMG), It can provide itself with high-value objectives .VMG Low level goal condition strategies that can be used to guide training through supervised learning , Maximize plot return . stay D4RL Benchmark experiments show that ,VMG It is superior to the most advanced methods in several tasks where long-term reasoning ability is crucial .