当前位置:网站首页>Robot reinforcement learning synergies between pushing and grassing with self supervised DRL (2018)
Robot reinforcement learning synergies between pushing and grassing with self supervised DRL (2018)
2022-07-04 23:31:00 【Qianyu QY】
Address of thesis :https://ieeexplore.ieee.org/document/8593986
1 brief introduction
model-free Reinforcement learning ,Q-learning
Method : Train two networks , Predict pixel level push Of Q-value and Pixel level grasp Of Q-value;Q-value The highest push or grasp Be performed .
Every pixel point push Is defined as pushing from left to right 10cm;grasp Is defined as centered on this point ,10cm For grab width , Horizontal grab .
At testing time , The image is rotated 16 Time , Sent to the network respectively , Therefore, it can be realized 16 From two angles push and grasp.
In this paper High dimensional action
, That is, grasp posture and push ;QT-Opt And so on Lower dimensional action
, That is, the end offset .
High dimensional action
stay Full drive system
It's possible , Full drive means that the motion of an object is completely controlled by a manipulator , Such as the capture of this article ;
Low dimension action
More suitable for Underactuated system
, It needs to be adjusted in real time according to the system feedback action, Finally reach the goal state . Underdrive means that the motion of an object is determined by the environment and the manipulator at the same time , Such as pre crawl 、 Push objects along the track .
2 Method
state:RGB-D Images
action: Describe in Section 1
grasp reward: Capture success reward=1. If the opening length of the gripper of the manipulator is greater than the threshold , Then the capture is successful .
push reward: The difference between scene images is greater than the threshold reward=0.5. The reward encourage push Action changes the scene , But it doesn't explicitly make future crawling more convenient .
Q Network structure : The structure of the two networks is the same . First, we will RGB Images and D Images are sent in parallel DenseNet, Then merge features , Output prediction by convolution and difference up sampling Q-value.
1、 How to give push Set up reward
answer : The difference between scene images is greater than the threshold reward=0.5. The reward encourage push Action changes the scene , But it doesn't explicitly make future crawling more convenient .
2、 How to train pixel level prediction network
answer : Execution only action Pixels of p Calculate the gradient , For all other 0
3 idea
1、 In essence, the method of this paper is supervised learning , Just put grasp/push The confidence label of is replaced by reward, It's essentially the same
边栏推荐
- Tweenmax emoticon button JS special effect
- [ODX studio edit PDX] -0.3- how to delete / modify inherited elements in variant variants
- Servlet+JDBC+MySQL简单web练习
- debug和release的区别
- 用快解析内网穿透实现零成本自建网站
- D3.js+Three. JS data visualization 3D Earth JS special effect
- Selected cutting-edge technical articles of Bi Ren Academy of science and technology
- Is the account opening link of Huatai Securities with low commission safe?
- 端口映射和端口转发区别是什么
- ffmpeg快速剪辑
猜你喜欢
[JS] - [sort related] - Notes
The difference between cout/cerr/clog
French scholars: the explicability of counter attack under optimal transmission theory
The input of uniapp is invalid except for numbers
A mining of edu certificate station
CTF competition problem solution STM32 reverse introduction
In June, the list of winners of "Moli original author program" was announced! Invite you to talk about the domestic database
Redis:Redis的事务
Redis: redis transactions
heatmap. JS picture hotspot heat map plug-in
随机推荐
Intelligence test to see idioms guess ancient poems wechat applet source code
ffmpeg快速剪辑
Go pit - no required module provides Package: go. Mod file not found in current directory or any parent
Build your own minecraft server with fast parsing
快解析——好用的内网安全软件
【监控】zabbix
A mining of edu certificate station
The input of uniapp is invalid except for numbers
The caching feature of docker image and dockerfile
Jar batch management gadget
Solution record of jamming when using CAD to move bricks in high configuration notebook
Question brushing guide public
Hash table, hash function, bloom filter, consistency hash
MIT-6.824-lab4B-2022(万字思路讲解-代码构建)
LabVIEW中比较两个VI
qt绘制网络拓补图(连接数据库,递归函数,无限绘制,可拖动节点)
[crawler] jsonpath for data extraction
[monitoring] ZABBIX
如何用快解析自制IoT云平台
香港珠宝大亨,22亿“抄底”佐丹奴