当前位置:网站首页>Robot reinforcement learning synergies between pushing and grassing with self supervised DRL (2018)
Robot reinforcement learning synergies between pushing and grassing with self supervised DRL (2018)
2022-07-04 23:31:00 【Qianyu QY】
Address of thesis :https://ieeexplore.ieee.org/document/8593986
1 brief introduction
model-free Reinforcement learning ,Q-learning
Method : Train two networks , Predict pixel level push Of Q-value and Pixel level grasp Of Q-value;Q-value The highest push or grasp Be performed .
Every pixel point push Is defined as pushing from left to right 10cm;grasp Is defined as centered on this point ,10cm For grab width , Horizontal grab .
At testing time , The image is rotated 16 Time , Sent to the network respectively , Therefore, it can be realized 16 From two angles push and grasp.
In this paper High dimensional action
, That is, grasp posture and push ;QT-Opt And so on Lower dimensional action
, That is, the end offset .
High dimensional action
stay Full drive system
It's possible , Full drive means that the motion of an object is completely controlled by a manipulator , Such as the capture of this article ;
Low dimension action
More suitable for Underactuated system
, It needs to be adjusted in real time according to the system feedback action, Finally reach the goal state . Underdrive means that the motion of an object is determined by the environment and the manipulator at the same time , Such as pre crawl 、 Push objects along the track .
2 Method
state:RGB-D Images
action: Describe in Section 1
grasp reward: Capture success reward=1. If the opening length of the gripper of the manipulator is greater than the threshold , Then the capture is successful .
push reward: The difference between scene images is greater than the threshold reward=0.5. The reward encourage push Action changes the scene , But it doesn't explicitly make future crawling more convenient .
Q Network structure : The structure of the two networks is the same . First, we will RGB Images and D Images are sent in parallel DenseNet, Then merge features , Output prediction by convolution and difference up sampling Q-value.
1、 How to give push Set up reward
answer : The difference between scene images is greater than the threshold reward=0.5. The reward encourage push Action changes the scene , But it doesn't explicitly make future crawling more convenient .
2、 How to train pixel level prediction network
answer : Execution only action Pixels of p Calculate the gradient , For all other 0
3 idea
1、 In essence, the method of this paper is supervised learning , Just put grasp/push The confidence label of is replaced by reward, It's essentially the same
边栏推荐
- Observable time series data downsampling practice in Prometheus
- Expand your kubecl function
- 壁仞科技研究院前沿技术文章精选
- 机器人强化学习——Learning Synergies between Pushing and Grasping with Self-supervised DRL (2018)
- MIT-6.824-lab4B-2022(万字思路讲解-代码构建)
- A mining of edu certificate station
- MariaDB的Galera集群-双主双活安装设置
- Etcd database source code analysis - brief process of processing entry records
- 机器学习在房屋价格预测上的应用
- Network namespace
猜你喜欢
cout/cerr/clog的区别
Redis: redis transactions
D3.js+Three. JS data visualization 3D Earth JS special effect
实战模拟│JWT 登录认证
Docker镜像的缓存特性和Dockerfile
ICML 2022 | 3dlinker: e (3) equal variation self encoder for molecular link design
How to do the project of computer remote company in foreign Internet?
【kotlin】第三天
CTF competition problem solution STM32 reverse introduction
Hash table, hash function, bloom filter, consistency hash
随机推荐
QT personal learning summary
A mining of edu certificate station
如何报考PMP项目管理认证考试?
Go pit - no required module provides Package: go. Mod file not found in current directory or any parent
JS 将伪数组转换成数组
Hong Kong Jewelry tycoon, 2.2 billion "bargain hunting" Giordano
Go step on the pit - no required module provides package: go mod file not found in current directory or any parent
电力运维云平台:开启电力系统“无人值班、少人值守”新模式
Etcd database source code analysis - brief process of processing entry records
colResizable. JS auto adjust table width plug-in
QT addition calculator (simple case)
实战模拟│JWT 登录认证
Actual combat simulation │ JWT login authentication
MP advanced operation: time operation, SQL, querywapper, lambdaquerywapper (condition constructor) quick filter enumeration class
如何用快解析自制IoT云平台
Chinese verification of JS regular expressions (turn)
【二叉树】节点与其祖先之间的最大差值
How to save your code works quickly to better protect your labor achievements
[JS] - [sort related] - Notes
MySQL数据库备份与恢复--mysqldump命令