当前位置:网站首页>The short ticket hypothesis: finding sparse, trainable neural networks
The short ticket hypothesis: finding sparse, trainable neural networks
2022-06-10 17:49:00 【Upward a Peng】
lottery ticket hypothesis( The lottery hypothesis )
The lottery ticket hypothesis predicts that ∃ m for which j0≤ j (commensurate training time), a0≥ a (commensurate accuracy), and kmk0? |θ| (fewer parameters).
There is always a small sub network in the network , When training alone , And iterate as much as possible after training , The test accuracy of the original network can be achieved .
The general process of winning the lottery network :
- Randomly initialize a neural network f(x;θ0) (where θ0∼ Dθ).
- Train the network for j iterations, arriving at parameters θj.
- Prune p% of the parameters in θj, creating a mask m.
- Reset the remaining parameters to their values in θ0, creating the winning ticket f(x;m?θ0)
1, Random initialization of neural networks f
2, iteration j Second training neural network , And get the parameters θj
3, stay θj Of the parameters , Trim by percent p Parameters of , And create mask m
4, Initialize the remaining parameter values with the initial values at the beginning of the network , And construct a winning lottery network ( That is, the network after pruning and reinitialization ).
among , Iterative pruning is used in this process , That is, if we want to pass n The next iteration prunes percent p The weight of , Then prune% of each iteration p Of n One third of the weight , This method is compared with one-time pruning , Smaller sub networks can be obtained ., And achieve the accuracy of the original network . The effect is shown in the figure 
In the process of obtaining , The network we finally get needs to be initialized to the initial value , Because only by rationalizing the network initialization value can we get a better winning lottery network . If reinitialization , The effect will be far less than the initial value of the network , The effect is as shown in the picture :
During pruning , Learning rate is also very important , High learning rate will lead to failure to find the winning lottery network , When the network has a high learning rate , Performance will be lower than that of random initialization . meanwhile , Using the learning rate warm-up can effectively improve the accuracy of the test set . The effect is as shown in the picture :
stay CNN The network should pay attention to :
In trim CNN When neural networks , Use Dropout It can effectively improve the test accuracy of the network ,dropout Can guide network pruning , And it may make the winning ticket easier to be found , The effect is as shown in the picture .
be used for CNN When neural networks , It is better to adopt global pruning , Because global pruning can prune out smaller winning lottery subnetworks better than hierarchical pruning . The effect is as shown in the picture .
Be careful :
1, Over parameterized networks are easier to prune , Because they have more winning ticket combinations .
2, Only when a certain degree of sparsity is reached , And the highly parameterized network can be reinitialized successfully , besides , Extreme pruning 、 Networks that are not overly parameterized can only maintain accuracy through accidental initialization .
3, The winning lottery contains the deviation value that sums up the task being performed .
边栏推荐
- 线上交流丨技能网络:解决多任务多模态问题的稀疏模型(青源Talk第19期 唐都钰)
- Leetcode String to integer(Atoi)
- Canvas大火燃烧h5动画js特效
- matplotlib plt.text()的具体用法——画图时给图中的点加标签
- mapbox-gl开发教程(十一):加载线图层
- C#_ Serial port communication project
- 亟需丰富智能家居产品线,扫地机器人赛道上挤得下萤石吗?
- 软件项目管理 6.10.成本预算
- Only three steps are needed to learn how to use low code thingjs to connect with Sen data Dix data
- High number_ Chapter 6 infinite series__ Absolute convergence_ Conditional convergence
猜你喜欢

单片机底层通信协议① —— 同步和异步、并行和串行、全双工和半双工以及单工、电平信号和差分信号

蓝桥杯_糊涂人寄信_递归

C#_ Serial port communication project

Redis通用指令

A few misunderstandings about programmers are very harmful!

numpy——记录

Swin_Transformer源码解读

Numpy np set_ Usage of printoptions () -- control output mode

matplotlib plt. Specific usage of text() - labeling points in a drawing

Online communication skill network: a sparse model for solving multi task and multi-modal problems (Qingyuan talk, issue 19, tangduyu)
随机推荐
mmdetection之dataset类解读
牛客网:表达式求值
Talk about message oriented middleware (1) and AMQP
High number_ Chapter 6 infinite series__ Absolute convergence_ Conditional convergence
开源项目 PM 浅谈如何设计官网
Force buckle 20 Valid parentheses
Canvas大火燃烧h5动画js特效
C# 根据EXCEL自动生成oracle建表语句
mapbox-gl开发教程(十一):加载线图层
仅需三步学会使用低代码ThingJS与森数据DIX数据对接
华为matepad能成为你的笔记本电脑副屏?
树、森林和二叉树的关系
numpy——记录
分享我做Dotnet9博客网站时积累的一些资料
The relationship between trees, forests and binary trees
Swift 3pThread tool Promise Pipeline Master/Slave Serial Thread confinement Serial queue
Leetcode String to integer(Atoi)
Protocol Gen go grpc 'is not an internal or external command, nor is it a runnable program or batch file
[the second revolution of report tools] optimize report structure and improve report operation performance based on SPL language
2022版IDEA图形界面GUI乱码解决方法超详细简单版