当前位置:网站首页>[NLP]—sparse neural network最新工作简述
[NLP]—sparse neural network最新工作简述
2022-07-03 04:06:00 【Muasci】
前言
ICLR 2019 best paper《THE LOTTERY TICKET HYPOTHESIS: FINDING SPARSE, TRAINABLE NEURAL NETWORKS》提出了彩票假设(lottery ticket hypothesis):“dense, randomly-initialized, feed-forward networks contain subnetworks (winning tickets) that—when trained in isolationreach test accuracy comparable to the original network in a similar number of iterations.”
而笔者在[文献阅读] Sparsity in Deep Learning: Pruning and growth for efficient inference and training in NN也(稀烂地)记录了这方面的工作。
本文打算进一步简述这方面最新的工作。另外,按照“when to sparsify”,这方面工作可被分为:Sparsify after training、Sparsify during training、Sparse training,而笔者更为关注后两者(因为是end2end的),所以本文(可能)会更加关注这两个子类别的工作。
《THE LOTTERY TICKET HYPOTHESIS: FINDING SPARSE, TRAINABLE NEURAL NETWORKS》ICLR 19
步骤:
- 初始化完全连接的神经网络θ,并确定裁剪率p
- 训练一定步数,得到θ1
- 从θ1中根据参数权重的数量级大小,裁剪掉p的数量级小的权重,并将剩下的权重重置成原来的初始化权重
- 继续训练
代码:
- tf:https://github.com/google-research/lottery-ticket-hypothesis
- pt:https://github.com/rahulvigneswaran/Lottery-Ticket-Hypothesis-in-Pytorch
《Rigging the Lottery: Making All Tickets Winners》ICML 20
步骤:
- 初始化神经网络,并预先进行裁剪。预先裁剪的方式考虑:
- uniform:每一层的稀疏率相同;
- 其它方式:层中参数越多,稀疏程度越高,使得不同层剩下的参数量总体均衡;
- 在训练过程中,每ΔT步,更新稀疏参数的分布。考虑drop和grow两种更新操作:
- drop:裁剪掉一定比例的数量级小的权重
- grow:从被裁剪的权重中,恢复相同比例梯度数量级大的权重
- drop\grow比例的变化策略:
其中,α是初始的更新比例,一般设为0.3。
特点:
- 端到端
- 支持grow
代码:
- tf:https://github.com/google-research/rigl
- pt:https://github.com/varun19299/rigl-reproducibility
《Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training》ICML 21
提出了In-Time Over-Parameterization指标:
认为,在可靠的探索的前提下,上述指标越高,也就是模型在训练过程中探索了更多的参数,最终的性能越好。
所以训练步骤应该和《Rigging the Lottery: Making All Tickets Winners》类似,不过具体使用的Sparse Evolutionary Training (SET)——《A. Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science》,它的grow是随机的。这样选择是因为:
SET activates new weights in a random fashion which naturally considers all possible parameters to explore. It also helps to avoid the dense over-parameterization bias introduced by the gradient-based methods e.g., The Rigged Lottery (RigL) (Evci et al., 2020a) and Sparse Networks from Scratch (SNFS) (Dettmers & Zettlemoyer, 2019), as the latter utilize dense gradients in the backward pass to explore new weights
特点:
- 探索度
代码:https://github.com/Shiweiliuiiiiiii/In-Time-Over-Parameterization
《EFFECTIVE MODEL SPARSIFICATION BY SCHEDULED GROW-AND-PRUNE METHODS》ICLR 2022
做法:
CYCLIC GAP:
- 把模型分成k份,每一份(?)都按照r的比例随机稀疏化。
- 初始时,把其中一份变为稠密
- 持续k步,步间隔为T个epoch。每一步将稠密的那一份稀疏化(magnitude-based),把那一份的下一份变为稠密。
- k步以后,将剩下的那一份稠密的稀疏化,然后微调。
PARALLEL GAP:k份,k节点,每一份在不同节点上都是稠密的。
特点:
- 扩大了探索空间
- 做了wmt14 de-en translation的实验
代码:https://github.com/Shiweiliuiiiiiii/In-Time-Over-Parameterization
《Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training》 NAACL 2022
做法:
特点:
- mask可训练,而非magnitude-based
- Task-Agnostic Mask Training(接下来不妨看看task-specific的?)
代码:https://github.com/llyx97/TAMT
《How fine can fine-tuning be? Learning efficient language models》AISTATS 2020
方法:
- L0-close fine-tuning:通过预实验,发现某些层、某些模块,在下游任务上finetune之后,其参数和原先的预训练参数差别不大,于是将其排除在本文提出的这种finetune过程中
- Sparsification as fine-tuning:为每一个task训练一个0\1mask
特点:
- 训练mask,来达到finetune pretrained model的效果
代码:无
但是有https://github.com/llyx97/TAMT这里面也有关于mask-training的代码,可以参考?
边栏推荐
- ZIP文件的导出
- [Yu Yue education] reference materials of political communication science of Communication University of China
- 【刷题篇】多数元素(超级水王问题)
- [DRM] simple analysis of DRM bridge driver call process
- 学会pytorch能干什么?
- 在写web项目的时候,文件上传用到了smartupload,用了new string()进行转码,但是在数据库中,还是会出现类似扑克的乱码
- [mathematical logic] propositional logic (judgment of the correctness of propositional logic reasoning | formal structure is eternal truth - equivalent calculus | deduction from premise - logical reas
- Ffmpeg download and installation tutorial and introduction
- Nat. Comm. | 使用Tensor-cell2cell对细胞通讯进行环境感知去卷积
- leetcode:297. 二叉树的序列化与反序列化
猜你喜欢
Download and install captura and configure ffmpeg in captura
pytorch项目怎么跑?
Deep dive kotlin synergy (19): flow overview
Ffmpeg recording screen and screenshot
SAP UI5 应用开发教程之一百零五 - SAP UI5 Master-Detail 布局模式的联动效果实现明细介绍
学会pytorch能干什么?
『期末复习』16/32位微处理器(8086)基本寄存器
leetcode:297. 二叉树的序列化与反序列化
IPv6 transition technology-6to4 manual tunnel configuration experiment -- Kuige of Shangwen network
用户体验五要素
随机推荐
105. SAP UI5 Master-Detail 布局模式的联动效果实现明细介绍
Leecode swipe questions and record LCP 18 breakfast combination
用户体验五要素
Js/ts bottom implementation double click event
SAP UI5 应用开发教程之一百零五 - SAP UI5 Master-Detail 布局模式的联动效果实现明细介绍
Cnopendata China Customs Statistics
基于Pytorch和RDKit的QSAR模型建立脚本
Web session management security issues
2022-07-02: what is the output of the following go language code? A: Compilation error; B:Panic; C:NaN。 package main import “fmt“ func main() { var a =
Analysis of the reason why the server cannot connect remotely
Ffmpeg download and installation tutorial and introduction
NPM: the 'NPM' item cannot be recognized as the name of a cmdlet, function, script file, or runnable program. Please check the spelling of the name. If the path is included, make sure the path is corr
Half of 2022 is over, so we must hurry up
CVPR 2022 | 大連理工提出自校准照明框架,用於現實場景的微光圖像增强
Wechat applet + Alibaba IOT platform + Hezhou air724ug build a serverless IOT system (III) -- wechat applet is directly connected to Alibaba IOT platform aliiot
8.8.2-PointersOnC-20220214
pytorch开源吗?
Filter
如何迈向IPv6之IPv6过渡技术-尚文网络奎哥
300+ documents! This article explains the latest progress of multimodal learning based on transformer