当前位置:网站首页>Robot reinforcement learning synergies between pushing and grassing with self supervised DRL (2018)

Robot reinforcement learning synergies between pushing and grassing with self supervised DRL (2018)

2022-07-04 23:31:00 Qianyu QY

Address of thesis :https://ieeexplore.ieee.org/document/8593986

1 brief introduction

model-free Reinforcement learning ,Q-learning

Method : Train two networks , Predict pixel level push Of Q-value and Pixel level grasp Of Q-value;Q-value The highest push or grasp Be performed .

Every pixel point push Is defined as pushing from left to right 10cm;grasp Is defined as centered on this point ,10cm For grab width , Horizontal grab .

At testing time , The image is rotated 16 Time , Sent to the network respectively , Therefore, it can be realized 16 From two angles push and grasp.

In this paper High dimensional action, That is, grasp posture and push ;QT-Opt And so on Lower dimensional action, That is, the end offset .

High dimensional action stay Full drive system It's possible , Full drive means that the motion of an object is completely controlled by a manipulator , Such as the capture of this article ;

Low dimension action More suitable for Underactuated system , It needs to be adjusted in real time according to the system feedback action, Finally reach the goal state . Underdrive means that the motion of an object is determined by the environment and the manipulator at the same time , Such as pre crawl 、 Push objects along the track .

2 Method

 Insert picture description here

state:RGB-D Images

action: Describe in Section 1

grasp reward: Capture success reward=1. If the opening length of the gripper of the manipulator is greater than the threshold , Then the capture is successful .

push reward: The difference between scene images is greater than the threshold reward=0.5. The reward encourage push Action changes the scene , But it doesn't explicitly make future crawling more convenient .

Q Network structure : The structure of the two networks is the same . First, we will RGB Images and D Images are sent in parallel DenseNet, Then merge features , Output prediction by convolution and difference up sampling Q-value.

1、 How to give push Set up reward

answer : The difference between scene images is greater than the threshold reward=0.5. The reward encourage push Action changes the scene , But it doesn't explicitly make future crawling more convenient .

2、 How to train pixel level prediction network

answer : Execution only action Pixels of p Calculate the gradient , For all other 0

3 idea

1、 In essence, the method of this paper is supervised learning , Just put grasp/push The confidence label of is replaced by reward, It's essentially the same

原网站

版权声明
本文为[Qianyu QY]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/185/202207042311089488.html