当前位置:网站首页>Robot reinforcement learning synergies between pushing and grassing with self supervised DRL (2018)
Robot reinforcement learning synergies between pushing and grassing with self supervised DRL (2018)
2022-07-04 23:31:00 【Qianyu QY】
Address of thesis :https://ieeexplore.ieee.org/document/8593986
1 brief introduction
model-free Reinforcement learning ,Q-learning
Method : Train two networks , Predict pixel level push Of Q-value and Pixel level grasp Of Q-value;Q-value The highest push or grasp Be performed .
Every pixel point push Is defined as pushing from left to right 10cm;grasp Is defined as centered on this point ,10cm For grab width , Horizontal grab .
At testing time , The image is rotated 16 Time , Sent to the network respectively , Therefore, it can be realized 16 From two angles push and grasp.
In this paper High dimensional action
, That is, grasp posture and push ;QT-Opt And so on Lower dimensional action
, That is, the end offset .
High dimensional action
stay Full drive system
It's possible , Full drive means that the motion of an object is completely controlled by a manipulator , Such as the capture of this article ;
Low dimension action
More suitable for Underactuated system
, It needs to be adjusted in real time according to the system feedback action, Finally reach the goal state . Underdrive means that the motion of an object is determined by the environment and the manipulator at the same time , Such as pre crawl 、 Push objects along the track .
2 Method
state:RGB-D Images
action: Describe in Section 1
grasp reward: Capture success reward=1. If the opening length of the gripper of the manipulator is greater than the threshold , Then the capture is successful .
push reward: The difference between scene images is greater than the threshold reward=0.5. The reward encourage push Action changes the scene , But it doesn't explicitly make future crawling more convenient .
Q Network structure : The structure of the two networks is the same . First, we will RGB Images and D Images are sent in parallel DenseNet, Then merge features , Output prediction by convolution and difference up sampling Q-value.
1、 How to give push Set up reward
answer : The difference between scene images is greater than the threshold reward=0.5. The reward encourage push Action changes the scene , But it doesn't explicitly make future crawling more convenient .
2、 How to train pixel level prediction network
answer : Execution only action Pixels of p Calculate the gradient , For all other 0
3 idea
1、 In essence, the method of this paper is supervised learning , Just put grasp/push The confidence label of is replaced by reward, It's essentially the same
边栏推荐
- Galera cluster of MariaDB - dual active and dual active installation settings
- Instructions for go defer
- [ODX studio edit PDX] -0.3- how to delete / modify inherited elements in variant variants
- 45岁教授,她投出2个超级独角兽
- What is the difference between port mapping and port forwarding
- Mit-6.824-lab4b-2022 (10000 word idea explanation - code construction)
- 用快解析内网穿透实现零成本自建网站
- [kotlin] the third day
- phpcms付费阅读功能支付宝支付
- Ffmpeg quick clip
猜你喜欢
【kotlin】第三天
多回路仪表在基站“转改直”方面的应用
S32 design studio for arm 2.2 quick start
Fast parsing intranet penetration helps enterprises quickly achieve collaborative office
CTF competition problem solution STM32 reverse introduction
Jar batch management gadget
PMP证书续证流程
一次edu证书站的挖掘
Excel shortcut keys - always add
Actual combat simulation │ JWT login authentication
随机推荐
如何在外地外网电脑远程公司项目?
一次edu证书站的挖掘
Hong Kong Jewelry tycoon, 2.2 billion "bargain hunting" Giordano
解决无法通过ssh服务远程连接虚拟机
How to reduce the stock account Commission and stock speculation commission? Is it safe to open an online account
qt绘制网络拓补图(连接数据库,递归函数,无限绘制,可拖动节点)
Redis: redis message publishing and subscription (understand)
城市轨道交通站应急照明疏散指示系统设计
Etcd database source code analysis - brief process of processing entry records
In the enterprise, win10 turns on BitLocker to lock the disk, how to back up the system, how to recover when the system has problems, and how to recover quickly while taking into account system securi
QT personal learning summary
How to save your code works quickly to better protect your labor achievements
HMS core machine learning service
JS 3D explosive fragment image switching JS special effect
A mining of edu certificate station
[ODX studio edit PDX] -0.3- how to delete / modify inherited elements in variant variants
MySQL数据库备份与恢复--mysqldump命令
如何用快解析自制IoT云平台
Financial markets, asset management and investment funds
Fast analysis -- easy to use intranet security software