当前位置:网站首页>Robot reinforcement learning synergies between pushing and grassing with self supervised DRL (2018)
Robot reinforcement learning synergies between pushing and grassing with self supervised DRL (2018)
2022-07-04 23:31:00 【Qianyu QY】
Address of thesis :https://ieeexplore.ieee.org/document/8593986
1 brief introduction
model-free Reinforcement learning ,Q-learning
Method : Train two networks , Predict pixel level push Of Q-value and Pixel level grasp Of Q-value;Q-value The highest push or grasp Be performed .
Every pixel point push Is defined as pushing from left to right 10cm;grasp Is defined as centered on this point ,10cm For grab width , Horizontal grab .
At testing time , The image is rotated 16 Time , Sent to the network respectively , Therefore, it can be realized 16 From two angles push and grasp.
In this paper High dimensional action, That is, grasp posture and push ;QT-Opt And so on Lower dimensional action, That is, the end offset .
High dimensional action stay Full drive system It's possible , Full drive means that the motion of an object is completely controlled by a manipulator , Such as the capture of this article ;
Low dimension action More suitable for Underactuated system , It needs to be adjusted in real time according to the system feedback action, Finally reach the goal state . Underdrive means that the motion of an object is determined by the environment and the manipulator at the same time , Such as pre crawl 、 Push objects along the track .
2 Method

state:RGB-D Images
action: Describe in Section 1
grasp reward: Capture success reward=1. If the opening length of the gripper of the manipulator is greater than the threshold , Then the capture is successful .
push reward: The difference between scene images is greater than the threshold reward=0.5. The reward encourage push Action changes the scene , But it doesn't explicitly make future crawling more convenient .
Q Network structure : The structure of the two networks is the same . First, we will RGB Images and D Images are sent in parallel DenseNet, Then merge features , Output prediction by convolution and difference up sampling Q-value.
1、 How to give push Set up reward
answer : The difference between scene images is greater than the threshold reward=0.5. The reward encourage push Action changes the scene , But it doesn't explicitly make future crawling more convenient .
2、 How to train pixel level prediction network
answer : Execution only action Pixels of p Calculate the gradient , For all other 0
3 idea
1、 In essence, the method of this paper is supervised learning , Just put grasp/push The confidence label of is replaced by reward, It's essentially the same
边栏推荐
- 高配笔记本使用CAD搬砖时卡死解决记录
- 企业里Win10 开启BitLocker锁定磁盘,如何备份系统,当系统出现问题又如何恢复,快速恢复又兼顾系统安全(远程设备篇)
- JS card style countdown days
- 电力运维云平台:开启电力系统“无人值班、少人值守”新模式
- Etcd database source code analysis - brief process of processing entry records
- Selected cutting-edge technical articles of Bi Ren Academy of science and technology
- [kotlin] the third day
- C language to quickly solve the reverse linked list
- How to apply for PMP project management certification examination?
- ffmpeg快速剪辑
猜你喜欢

Why does infographic help your SEO

ICML 2022 | 3dlinker: e (3) equal variation self encoder for molecular link design

【二叉树】节点与其祖先之间的最大差值

Etcd database source code analysis - brief process of processing entry records

Pytoch --- use pytoch to realize linknet for semantic segmentation

uniapp 除了数字,其他输入无效

Galera cluster of MariaDB - dual active and dual active installation settings

Jar batch management gadget

如何用快解析自制IoT云平台

Qualcomm WLAN framework learning (30) -- components supporting dual sta
随机推荐
ffmpeg快速剪辑
45 year old professor, she threw two super unicorns
如果炒股开华泰证券的户,在网上开户安全吗?
How to reduce the stock account Commission and stock speculation commission? Is it safe to open an online account
JS 将伪数组转换成数组
CTF competition problem solution STM32 reverse introduction
机器人强化学习——Learning Synergies between Pushing and Grasping with Self-supervised DRL (2018)
数据库基础知识
【kotlin】第三天
初试为锐捷交换机跨设备型号升级版本(以RG-S2952G-E为例)
Solution record of jamming when using CAD to move bricks in high configuration notebook
Servlet+JDBC+MySQL简单web练习
Redis: redis transactions
如何避免电弧产生?—— AAFD故障电弧探测器为您解决
MP进阶操作: 时间操作, sql,querywapper,lambdaQueryWapper(条件构造器)快速筛选 枚举类
ScriptableObject
CTF竞赛题解之stm32逆向入门
MIT-6.824-lab4B-2022(万字思路讲解-代码构建)
[monitoring] ZABBIX
【监控】zabbix