当前位置:网站首页>ICML 2022:ufrgs | optimistic linear support and subsequent features as the basis for optimal strategy transfer
ICML 2022:ufrgs | optimistic linear support and subsequent features as the basis for optimal strategy transfer
2022-06-27 23:31:00 【Zhiyuan community】
【 title 】Optimistic Linear Support and Successor Features as a Basis for Optimal Policy Transfer
【 The author team 】Lucas N. Alegre, Ana L. C. Bazzan, Bruno C. da Silva
【 Date of publication 】2022.6.22
【 Thesis link 】https://arxiv.org/pdf/2206.11326.pdf
【 Recommended reasons 】 In many real-world applications , Reinforcement learning (RL) An agent may have to solve multiple tasks , Each task is usually modeled by a reward function . If the reward function is linear , And the agent has learned a set of strategies for different tasks , Then you can take advantage of the following features (SFs) To combine these strategies , And find reasonable solutions to new problems . However , The identified solution is not guaranteed to be optimal . This paper introduces a new algorithm to solve this limitation . It allows the RL Agents combine existing strategies , And directly determine the best strategy for any new problem , Without any further interaction with the environment . This paper first proves that the transfer learning problem solved by systemic functional language learners is equivalent to that solved in RL Learning to optimize multi-objective problems . then , In this paper, we introduce an optimistic linear support algorithm based on SF To learn a set of strategies , These strategies are SF Form a convex covering set . Experiments show that this method is superior to the most advanced competitive algorithm in both discrete and continuous fields .
边栏推荐
- 【蓝桥杯集训100题】scratch数字计算 蓝桥杯scratch比赛专项预测编程题 集训模拟练习题第16题
- 【剑指Offer】48. 最长不含重复字符的子字符串
- mongodb基础操作之聚合操作、索引优化
- NDSS 2022 接收的列表
- What problems should be paid attention to in the serpentine wiring of PCB?
- Zabbix6.0 upgrade Guide - how to synchronize database upgrades?
- 最新云开发微信余额充电器特效小程序源码
- 刚开始看英文文献,想问一下各位,最初应该怎么看进去?
- Swing UI container (I)
- c语言字符指针、字符串初始化问题
猜你喜欢

Is the dog virtue training with a monthly salary of 30000 a good business?

Feign通过自定义注解实现路径的转义

捷码赋能案例:湖南天辰产研实力迅速提升!实战玩转智慧楼宇/工地等项目

Discuz taobaoke website template / Dean taobaoke shopping style commercial version template

在线JSON转PlainText工具

This year's examinees are more "desperate" than the college entrance examination

Vivado FFT IP的使用说明

Practice torch FX: pytorch based model optimization quantization artifact

Golang - the difference between new and make

【Try to Hack】veil-evasion免杀
随机推荐
Technical implementation process of easycvr platform routing log function [code attached]
Detect objects and transfer images through mqtt
Golang - the difference between new and make
Aggregation and index optimization of mongodb basic operations
通过 MQTT 检测对象和传输图像
小程序referer
Usage of vivado vio IP
virtualbox扩展动态磁盘大小的坑
webService
Using xgboost with tidymodels
Discuz小鱼游戏风影传说商业GBK+UTF8版模板/DZ游戏网站模板
消除el-image图片周围间隙
凌云出海记 | 沐融科技&华为云:打造非洲金融SaaS解决方案样板
【Vim】使用教程,常用命令,高效使用Vim编辑器
go日志包 log的使用
[网络]常见的请求方法
EasyCVR平台路由日志功能的技术实现过程【附代码】
NDSS 2022 接收的列表
Design of STM32 and rc522 simple bus card system
【Try to Hack】veil-evasion免杀