当前位置:网站首页>Robot reinforcement learning synergies between pushing and grassing with self supervised DRL (2018)
Robot reinforcement learning synergies between pushing and grassing with self supervised DRL (2018)
2022-07-04 23:31:00 【Qianyu QY】
Address of thesis :https://ieeexplore.ieee.org/document/8593986
1 brief introduction
model-free Reinforcement learning ,Q-learning
Method : Train two networks , Predict pixel level push Of Q-value and Pixel level grasp Of Q-value;Q-value The highest push or grasp Be performed .
Every pixel point push Is defined as pushing from left to right 10cm;grasp Is defined as centered on this point ,10cm For grab width , Horizontal grab .
At testing time , The image is rotated 16 Time , Sent to the network respectively , Therefore, it can be realized 16 From two angles push and grasp.
In this paper High dimensional action, That is, grasp posture and push ;QT-Opt And so on Lower dimensional action, That is, the end offset .
High dimensional action stay Full drive system It's possible , Full drive means that the motion of an object is completely controlled by a manipulator , Such as the capture of this article ;
Low dimension action More suitable for Underactuated system , It needs to be adjusted in real time according to the system feedback action, Finally reach the goal state . Underdrive means that the motion of an object is determined by the environment and the manipulator at the same time , Such as pre crawl 、 Push objects along the track .
2 Method

state:RGB-D Images
action: Describe in Section 1
grasp reward: Capture success reward=1. If the opening length of the gripper of the manipulator is greater than the threshold , Then the capture is successful .
push reward: The difference between scene images is greater than the threshold reward=0.5. The reward encourage push Action changes the scene , But it doesn't explicitly make future crawling more convenient .
Q Network structure : The structure of the two networks is the same . First, we will RGB Images and D Images are sent in parallel DenseNet, Then merge features , Output prediction by convolution and difference up sampling Q-value.
1、 How to give push Set up reward
answer : The difference between scene images is greater than the threshold reward=0.5. The reward encourage push Action changes the scene , But it doesn't explicitly make future crawling more convenient .
2、 How to train pixel level prediction network
answer : Execution only action Pixels of p Calculate the gradient , For all other 0
3 idea
1、 In essence, the method of this paper is supervised learning , Just put grasp/push The confidence label of is replaced by reward, It's essentially the same
边栏推荐
- Pict generate orthogonal test cases tutorial
- 实战模拟│JWT 登录认证
- 如何报考PMP项目管理认证考试?
- 机器学习在房屋价格预测上的应用
- Selected cutting-edge technical articles of Bi Ren Academy of science and technology
- Qualcomm WLAN framework learning (30) -- components supporting dual sta
- 如果炒股开华泰证券的户,在网上开户安全吗?
- 法国学者:最优传输理论下对抗攻击可解释性探讨
- MySQL数据库备份与恢复--mysqldump命令
- The difference between debug and release
猜你喜欢

Using fast parsing intranet penetration to realize zero cost self built website

In the enterprise, win10 turns on BitLocker to lock the disk, how to back up the system, how to recover when the system has problems, and how to recover quickly while taking into account system securi

QT personal learning summary

Jar batch management gadget

Observable time series data downsampling practice in Prometheus

C language to quickly solve the reverse linked list

Network namespace

CTF競賽題解之stm32逆向入門

Docker镜像的缓存特性和Dockerfile

使用快解析搭建自己的minecraft服务器
随机推荐
How to do the project of computer remote company in foreign Internet?
【爬虫】数据提取之xpath
JS 将伪数组转换成数组
端口映射和端口转发区别是什么
[binary tree] the maximum difference between a node and its ancestor
Chinese verification of JS regular expressions (turn)
【kotlin】第三天
qt绘制网络拓补图(连接数据库,递归函数,无限绘制,可拖动节点)
如何报考PMP项目管理认证考试?
Redis: redis message publishing and subscription (understand)
【爬虫】数据提取之JSONpath
MP advanced operation: time operation, SQL, querywapper, lambdaquerywapper (condition constructor) quick filter enumeration class
可观测|时序数据降采样在Prometheus实践复盘
Pytoch --- use pytoch to realize linknet for semantic segmentation
The input of uniapp is invalid except for numbers
OSEK standard ISO_ 17356 summary introduction
多回路仪表在基站“转改直”方面的应用
The initial arrangement of particles in SPH (solved by two pictures)
【js】-【排序-相关】-笔记
华泰证券低佣金的开户链接安全吗?