当前位置:网站首页>ICML 2022:ufrgs | optimistic linear support and subsequent features as the basis for optimal strategy transfer
ICML 2022:ufrgs | optimistic linear support and subsequent features as the basis for optimal strategy transfer
2022-06-27 23:31:00 【Zhiyuan community】
【 title 】Optimistic Linear Support and Successor Features as a Basis for Optimal Policy Transfer
【 The author team 】Lucas N. Alegre, Ana L. C. Bazzan, Bruno C. da Silva
【 Date of publication 】2022.6.22
【 Thesis link 】https://arxiv.org/pdf/2206.11326.pdf
【 Recommended reasons 】 In many real-world applications , Reinforcement learning (RL) An agent may have to solve multiple tasks , Each task is usually modeled by a reward function . If the reward function is linear , And the agent has learned a set of strategies for different tasks , Then you can take advantage of the following features (SFs) To combine these strategies , And find reasonable solutions to new problems . However , The identified solution is not guaranteed to be optimal . This paper introduces a new algorithm to solve this limitation . It allows the RL Agents combine existing strategies , And directly determine the best strategy for any new problem , Without any further interaction with the environment . This paper first proves that the transfer learning problem solved by systemic functional language learners is equivalent to that solved in RL Learning to optimize multi-objective problems . then , In this paper, we introduce an optimistic linear support algorithm based on SF To learn a set of strategies , These strategies are SF Form a convex covering set . Experiments show that this method is superior to the most advanced competitive algorithm in both discrete and continuous fields .
边栏推荐
- 用pytorch进行CIFAR-10数据集分类
- Stream + Nacos
- Small chip chiplet Technology
- 实践torch.fx:基于Pytorch的模型优化量化神器
- virtualbox扩展动态磁盘大小的坑
- pytorch 入门指南
- Spark BUG实践(包含的BUG:ClassCastException;ConnectException;NoClassDefFoundError;RuntimeExceptio等。。。。)
- Spark bug practice (including bug:classcastexception; connectexception; NoClassDefFoundError; runtimeException, etc.)
- 良/恶性乳腺肿瘤预测(逻辑回归分类器)
- 2022年PMP项目管理考试敏捷知识点(3)
猜你喜欢

Excel print settings public header

电子科大(申恒涛团队)&京东AI(梅涛团队)提出用于视频问答的结构化双流注意网络,性能SOTA!优于基于双视频表示的方法!

Spug - 轻量级自动化运维平台

Death of 5 yuan youkuang in Yuanqi forest

圖的存儲結構

c语言之字符串数组

Detect objects and transfer images through mqtt

Livox lidar+ Haikang camera generates color point cloud in real time

Golang - the difference between new and make

基于 ESXi 的黑群晖 DSM 7.0.1 安装 VMware Tools
随机推荐
This year's examinees are more "desperate" than the college entrance examination
「R」 Using ggpolar to draw survival association network diagram
Getting started with pytorch
Death of 5 yuan youkuang in Yuanqi forest
在线JSON转PlainText工具
MapReduce初级编程实践
最新云开发微信余额充电器特效小程序源码
golang - new和make的区别
Livox lidar+apx15 real-time high-precision radar map reproduction and sorting
刚开始看英文文献,想问一下各位,最初应该怎么看进去?
MySQL十八:写语句的执行过程
webService
First principles (optimal solution theory)
文献综述如何挑选文献进行阅读,比如我的检索结果有200多篇根本看不完,如何进行文献挑选呢?...
vivado 如何添加时序约束
Avoid using 100vh[easy to understand] at mobile terminal
云辅助隐私集合求交(Server-Aided PSI)协议介绍:学习
Stream + Nacos
Azure Kinect DK realizes 3D reconstruction (PC non real time version)
因美纳陷数据泄露“丑闻”:我国基因数据安全能交给美企吗?