当前位置：网站首页>Cut off 20% of Imagenet data volume, and the performance of the model will not decline! Meta Stanford et al. Proposed a new method, using knowledge distillation to slim down the data set

Cut off 20% of Imagenet data volume, and the performance of the model will not decline! Meta Stanford et al. Proposed a new method, using knowledge distillation to slim down the data set

2022-07-05 09:59:00 【QbitAl】

bright and quick From the Aofei temple
qubits | official account QbitAI

These two days , The reward offered on twitter was a mess .

a AI The company offers 25 ten thousand dollar （ Renminbi equivalent 167 Ten thousand yuan ）, Offer a reward for what task can make the model bigger 、 The worse the performance .

There has been a heated discussion in the comment area .

But it's not just a whole job , But to further explore the big model .

After all , In the past two years, everyone has become more and more aware of ,AI The model cannot simply compare “ Big ”.

One side , As the scale of the model grows , The cost of training began to increase exponentially ;

On the other hand , The improvement of model performance has gradually reached the bottleneck , Even if you want to reduce the error again 1%, Need more data set increments and calculation increments .

For example, for Transformer for , Cross entropy loss wants to start from 3.4 Knight lowered to 2.8 Knight , You need the original 10 times Amount of training data .

To address these issues ,AI Scholars have been looking for solutions in various directions .

Meta Stanford scholars , Recently, I thought of starting from Data sets Upper cut .

They put forward , Distill the data set , Make the data set small , But it can also keep the performance of the model from declining .

Experimental verification , Cutting ImageNet 20% After the amount of data ,ResNets There is little difference between the performance and the accuracy when using the original data .

The researchers say , This is also for AGI The realization has found a new way .

The efficiency of large data sets is not high

The method proposed in this paper , In fact, it is to optimize and simplify the original data set .

The researchers say , Many methods in the past have shown , Many training examples are highly redundant , In theory, the data set “ cut ” Smaller .

And recently, some studies have proposed some indicators , You can rank training examples according to their difficulty or importance , And by retaining some of these difficult examples , Data pruning can be completed .

Based on previous discoveries and research , This time, scholars further put forward some concrete methods .

First , They proposed a method of data analysis , The model can learn only part of the data , Can achieve the same performance .

Through data analysis , The researchers came to a preliminary conclusion ：

How can a dataset be pruned best ？ This is related to its own scale .

The more initial data , The more difficult examples should be kept ;

The less initial data , Then we should keep the examples with low difficulty .

After retaining the difficult examples for data pruning , The corresponding relationship between model and data scale , Can break the power-law distribution .

The often mentioned 28 law is based on the power law .

namely 20% Will affect 80% Result .

And in this case , We can also find an extreme value under Pareto optimality .

Pareto optimality here refers to an ideal state of resource allocation .

It assumes a fixed group of people and allocable resources , Adjust from one allocation state to another , Without making anyone worse , At least make one person better .

In this paper , Adjusting the allocation status can be understood as , How much proportion of data set to trim .

then , Researchers conducted experiments to verify this theory .

From the experimental results , When the data set is larger , The more obvious the effect after pruning .

stay SVHN、CIFAR-10、ImageNet On several datasets ,ResNet The overall error rate is inversely proportional to the pruning scale of the dataset .

stay ImageNet You can see up here , Data set size Retain 80% Under the circumstances , The error rate under the training of the original data set is basically the same .

This curve also approximates Pareto optimality .

Next , Researchers focused on ImageNet On , Yes 10 Large scale benchmarking has been carried out in different cases .

It turns out that , Random pruning and some pruning indicators , stay ImageNet The performance is not good enough .

So go further , Researchers also proposed a self-monitoring method to prune data .

That is, knowledge distillation （ Teacher student model ）, This is a common method of model compression .

Results show , Under the self-monitoring method , It's looking for data sets that are simple / The performance of difficult examples is good .

After pruning data with self-monitoring method , The accuracy is significantly improved （ chart C Medium light blue line ）.

There are still some problems

But in the paper , Researchers also mentioned , Although the data set can be pruned without sacrificing performance through the above method , But some problems still deserve attention .

For example, after the data set is reduced , Want to train a model with the same performance , It may take longer .

therefore , When pruning data sets , We should balance the factors of reducing the scale and increasing the training time .

meanwhile , Prune the dataset , It is bound to lose some samples of groups , This may also cause the model to have drawbacks in a certain aspect .

In this respect, it will easily cause moral and ethical problems .

Research team

One of the authors of this article Surya Ganguli, Is a quantum neural network scientist .

before , During his undergraduate study at Stanford , At the same time, I learned computer science 、 Mathematics and Physics , After that, I got a master's degree in electrical engineering and computer science .

Address of thesis ：
https://arxiv.org/abs/2206.14486

原网站

版权声明
本文为[QbitAl]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/186/202207050934556454.html

当前位置：网站首页>Cut off 20% of Imagenet data volume, and the performance of the model will not decline! Meta Stanford et al. Proposed a new method, using knowledge distillation to slim down the data set

Cut off 20% of Imagenet data volume, and the performance of the model will not decline! Meta Stanford et al. Proposed a new method, using knowledge distillation to slim down the data set

bright and quick From the Aofei temple
qubits | official account QbitAI

The efficiency of large data sets is not high

There are still some problems

Research team

边栏推荐

猜你喜欢

随机推荐

当前位置：网站首页>Cut off 20% of Imagenet data volume, and the performance of the model will not decline! Meta Stanford et al. Proposed a new method, using knowledge distillation to slim down the data set

Cut off 20% of Imagenet data volume, and the performance of the model will not decline! Meta Stanford et al. Proposed a new method, using knowledge distillation to slim down the data set

bright and quick From the Aofei temple qubits | official account QbitAI

The efficiency of large data sets is not high

There are still some problems

Research team

边栏推荐

猜你喜欢

随机推荐

bright and quick From the Aofei temple
qubits | official account QbitAI