当前位置：网站首页>The short ticket hypothesis: finding sparse, trainable neural networks

The short ticket hypothesis: finding sparse, trainable neural networks

2022-06-10 17:49:00 【Upward a Peng】

lottery ticket hypothesis（ The lottery hypothesis ）

The lottery ticket hypothesis predicts that ∃ m for which j0≤ j (commensurate training time), a0≥ a (commensurate accuracy), and kmk0? |θ| (fewer parameters).

There is always a small sub network in the network , When training alone , And iterate as much as possible after training , The test accuracy of the original network can be achieved .

The general process of winning the lottery network ：

Randomly initialize a neural network f(x;θ0) (where θ0∼ Dθ).
Train the network for j iterations, arriving at parameters θj.
Prune p% of the parameters in θj, creating a mask m.
Reset the remaining parameters to their values in θ0, creating the winning ticket f(x;m?θ0)

1, Random initialization of neural networks f
2, iteration j Second training neural network , And get the parameters θj
3, stay θj Of the parameters , Trim by percent p Parameters of , And create mask m
4, Initialize the remaining parameter values with the initial values at the beginning of the network , And construct a winning lottery network （ That is, the network after pruning and reinitialization ）.

among , Iterative pruning is used in this process , That is, if we want to pass n The next iteration prunes percent p The weight of , Then prune% of each iteration p Of n One third of the weight , This method is compared with one-time pruning , Smaller sub networks can be obtained ., And achieve the accuracy of the original network . The effect is shown in the figure
Insert picture description here

In the process of obtaining , The network we finally get needs to be initialized to the initial value , Because only by rationalizing the network initialization value can we get a better winning lottery network . If reinitialization , The effect will be far less than the initial value of the network , The effect is as shown in the picture ：
Insert picture description here
During pruning , Learning rate is also very important , High learning rate will lead to failure to find the winning lottery network , When the network has a high learning rate , Performance will be lower than that of random initialization . meanwhile , Using the learning rate warm-up can effectively improve the accuracy of the test set . The effect is as shown in the picture ：
Insert picture description here

stay CNN The network should pay attention to ：

In trim CNN When neural networks , Use Dropout It can effectively improve the test accuracy of the network ,dropout Can guide network pruning , And it may make the winning ticket easier to be found , The effect is as shown in the picture .
Insert picture description here
be used for CNN When neural networks , It is better to adopt global pruning , Because global pruning can prune out smaller winning lottery subnetworks better than hierarchical pruning . The effect is as shown in the picture .

Be careful ：

1, Over parameterized networks are easier to prune , Because they have more winning ticket combinations .
2, Only when a certain degree of sparsity is reached , And the highly parameterized network can be reinitialized successfully , besides , Extreme pruning 、 Networks that are not overly parameterized can only maintain accuracy through accidental initialization .
3, The winning lottery contains the deviation value that sums up the task being performed .

原网站

版权声明
本文为[Upward a Peng]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/161/202206101649551397.html