当前位置:网站首页>Robot reinforcement learning synergies between pushing and grassing with self supervised DRL (2018)
Robot reinforcement learning synergies between pushing and grassing with self supervised DRL (2018)
2022-07-04 23:31:00 【Qianyu QY】
Address of thesis :https://ieeexplore.ieee.org/document/8593986
1 brief introduction
model-free Reinforcement learning ,Q-learning
Method : Train two networks , Predict pixel level push Of Q-value and Pixel level grasp Of Q-value;Q-value The highest push or grasp Be performed .
Every pixel point push Is defined as pushing from left to right 10cm;grasp Is defined as centered on this point ,10cm For grab width , Horizontal grab .
At testing time , The image is rotated 16 Time , Sent to the network respectively , Therefore, it can be realized 16 From two angles push and grasp.
In this paper High dimensional action, That is, grasp posture and push ;QT-Opt And so on Lower dimensional action, That is, the end offset .
High dimensional action stay Full drive system It's possible , Full drive means that the motion of an object is completely controlled by a manipulator , Such as the capture of this article ;
Low dimension action More suitable for Underactuated system , It needs to be adjusted in real time according to the system feedback action, Finally reach the goal state . Underdrive means that the motion of an object is determined by the environment and the manipulator at the same time , Such as pre crawl 、 Push objects along the track .
2 Method

state:RGB-D Images
action: Describe in Section 1
grasp reward: Capture success reward=1. If the opening length of the gripper of the manipulator is greater than the threshold , Then the capture is successful .
push reward: The difference between scene images is greater than the threshold reward=0.5. The reward encourage push Action changes the scene , But it doesn't explicitly make future crawling more convenient .
Q Network structure : The structure of the two networks is the same . First, we will RGB Images and D Images are sent in parallel DenseNet, Then merge features , Output prediction by convolution and difference up sampling Q-value.
1、 How to give push Set up reward
answer : The difference between scene images is greater than the threshold reward=0.5. The reward encourage push Action changes the scene , But it doesn't explicitly make future crawling more convenient .
2、 How to train pixel level prediction network
answer : Execution only action Pixels of p Calculate the gradient , For all other 0
3 idea
1、 In essence, the method of this paper is supervised learning , Just put grasp/push The confidence label of is replaced by reward, It's essentially the same
边栏推荐
- How to reduce the stock account Commission and stock speculation commission? Is it safe to open an online account
- How to save your code works quickly to better protect your labor achievements
- go踩坑——no required module provides package : go.mod file not found in current directory or any parent
- Is the account opening link of Huatai Securities with low commission safe?
- qt绘制网络拓补图(连接数据库,递归函数,无限绘制,可拖动节点)
- OSEK standard ISO_ 17356 summary introduction
- French scholars: the explicability of counter attack under optimal transmission theory
- Basic use and upgrade of Android native database
- Redis:Redis的事务
- Hong Kong Jewelry tycoon, 2.2 billion "bargain hunting" Giordano
猜你喜欢

How to do the project of computer remote company in foreign Internet?

Excel 快捷键-随时补充

QT drawing network topology diagram (connecting database, recursive function, infinite drawing, dragging nodes)

取得PMP证书需要多长时间?

SPH中的粒子初始排列问题(两张图解决)

可观测|时序数据降采样在Prometheus实践复盘

Tweenmax emoticon button JS special effect

The initial arrangement of particles in SPH (solved by two pictures)

The initial trial is the cross device model upgrade version of Ruijie switch (taking rg-s2952g-e as an example)

【js】-【排序-相关】-笔记
随机推荐
PMP证书续证流程
[ODX studio edit PDX] -0.3- how to delete / modify inherited elements in variant variants
ECCV 2022 | 腾讯优图提出DisCo:拯救小模型在自监督学习中的效果
Examples of time (calculation) total tools: start time and end time of this year, etc
Font design symbol combination multifunctional wechat applet source code
Expand your kubecl function
Jar batch management gadget
CTF競賽題解之stm32逆向入門
Instructions for go defer
[monitoring] ZABBIX
ScriptableObject
Object detection based on OpenCV haarcascades
Stm32 Reverse Introduction to CTF Competition Interpretation
MariaDB的Galera集群应用场景--数据库多主多活
ECCV 2022 | Tencent Youtu proposed disco: the effect of saving small models in self supervised learning
Mysql database backup and recovery -- mysqldump command
如何避免电弧产生?—— AAFD故障电弧探测器为您解决
Docker镜像的缓存特性和Dockerfile
List related knowledge points to be sorted out
Etcd database source code analysis - brief process of processing entry records