当前位置:网站首页>Icml2022 | utility theory of sequential decision making
Icml2022 | utility theory of sequential decision making
2022-06-30 21:02:00 【Zhiyuan community】

Thesis link :https://arxiv.org/pdf/2206.12562.pdf
Large based on Transformer The model has shown superior performance in various naturallanguageprocessing and computer vision tasks . However , These models contain a large number of parameters , This limits their deployment in real applications . To reduce the size of the model , Researchers prune these models according to the importance score of the weight . However , These scores are usually estimated in small batches during training , Due to small batch sampling and complex training dynamics , This brings a lot of variability / uncertainty . Because of this uncertainty , Common pruning methods prune some key weights , Make training unstable , It is not conducive to generalization . To solve this problem , We proposed PLATON Algorithm , The algorithm uses the upper confidence limit of importance estimation (upper confidence bound, UCB) To capture the uncertainty of the importance score . Especially for the weight with low importance score but high uncertainty ,PLATON Tend to keep them and explore their capacity . We are in natural language understanding 、 Question answering and image classification are based on transformer A large number of experiments have been carried out on the model , To verify PLATON The effectiveness of the . It turns out that , At different sparsity levels ,PLATON The algorithm has been significantly improved .

边栏推荐
- 树基本概念
- 三个火枪手
- Game 81 biweekly
- Peking University ACM problems 1002:487-3279
- Software engineering UML drawing
- 报错FileSystemException: /datas/nodes/0/indices/gtTXk-hnTgKhAcm-8n60Jw/1/index/.es_temp_file:结构需要清理
- 转:用实际行动赢得别人追随
- Lumiprobe 聚乙二醇化和 PEG 接头丨碘-PEG3-酸研究
- 两个skyline
- Label Contrastive Coding based Graph Neural Network for Graph Classification
猜你喜欢

uniapp-路由uni-simple-router

Basic concepts of tree

Software engineering UML drawing

个人开发的渗透测试工具Satania

MySQL:SQL概述及数据库系统介绍 | 黑马程序员

Binary search tree (1) - concept and C language implementation

B_QuRT_User_Guide(31)

3Ds Max 精模obj模型导入ArcGIS Pro (二)要点补充

Vite2兼容低版本chrome(如搜狗80),通过polyfills处理部分需求高版本的语法

银行集体下架的智能投顾产品,为何成了“鸡肋”?
随机推荐
RP原型资源分享-购物类App
Two skylines
毕业设计
大学生研究生毕业找工作,该选择哪个方向?
个人开发的渗透测试工具Satania
MySQL introduction, detailed installation steps and usage | dark horse programmer
coredns 修改upstream
No "history of blood and tears" in home office | community essay solicitation
二叉查找树(一) - 概念与C语言实现
我想知道股票开户要认识谁?另外,手机开户安全么?
Peking University ACM problems 1001:exposition
B_QuRT_User_Guide(33)
三个火枪手
mysql-批量更新
Label Contrastive Coding based Graph Neural Network for Graph Classification
数字货币:影响深远的创新
What bank card do you need to open an account online? In addition, is it safe to open an account online now?
centos——开启/关闭oracle
Peking University ACM problems 1000:a+b problem
DM8:生成DM AWR报告