当前位置:网站首页>Cut off 20% of Imagenet data volume, and the performance of the model will not decline! Meta Stanford et al. Proposed a new method, using knowledge distillation to slim down the data set
Cut off 20% of Imagenet data volume, and the performance of the model will not decline! Meta Stanford et al. Proposed a new method, using knowledge distillation to slim down the data set
2022-07-05 09:59:00 【QbitAl】
bright and quick From the Aofei temple
qubits | official account QbitAI
These two days , The reward offered on twitter was a mess .
a AI The company offers 25 ten thousand dollar ( Renminbi equivalent 167 Ten thousand yuan ), Offer a reward for what task can make the model bigger 、 The worse the performance .

There has been a heated discussion in the comment area .

But it's not just a whole job , But to further explore the big model .
After all , In the past two years, everyone has become more and more aware of ,AI The model cannot simply compare “ Big ”.
One side , As the scale of the model grows , The cost of training began to increase exponentially ;

On the other hand , The improvement of model performance has gradually reached the bottleneck , Even if you want to reduce the error again 1%, Need more data set increments and calculation increments .
For example, for Transformer for , Cross entropy loss wants to start from 3.4 Knight lowered to 2.8 Knight , You need the original 10 times Amount of training data .
To address these issues ,AI Scholars have been looking for solutions in various directions .
Meta Stanford scholars , Recently, I thought of starting from Data sets Upper cut .
They put forward , Distill the data set , Make the data set small , But it can also keep the performance of the model from declining .
Experimental verification , Cutting ImageNet 20% After the amount of data ,ResNets There is little difference between the performance and the accuracy when using the original data .
The researchers say , This is also for AGI The realization has found a new way .

The efficiency of large data sets is not high
The method proposed in this paper , In fact, it is to optimize and simplify the original data set .
The researchers say , Many methods in the past have shown , Many training examples are highly redundant , In theory, the data set “ cut ” Smaller .
And recently, some studies have proposed some indicators , You can rank training examples according to their difficulty or importance , And by retaining some of these difficult examples , Data pruning can be completed .
Based on previous discoveries and research , This time, scholars further put forward some concrete methods .
First , They proposed a method of data analysis , The model can learn only part of the data , Can achieve the same performance .

Through data analysis , The researchers came to a preliminary conclusion :
How can a dataset be pruned best ? This is related to its own scale .
The more initial data , The more difficult examples should be kept ;
The less initial data , Then we should keep the examples with low difficulty .

After retaining the difficult examples for data pruning , The corresponding relationship between model and data scale , Can break the power-law distribution .
The often mentioned 28 law is based on the power law .
namely 20% Will affect 80% Result .

And in this case , We can also find an extreme value under Pareto optimality .
Pareto optimality here refers to an ideal state of resource allocation .
It assumes a fixed group of people and allocable resources , Adjust from one allocation state to another , Without making anyone worse , At least make one person better .
In this paper , Adjusting the allocation status can be understood as , How much proportion of data set to trim .
then , Researchers conducted experiments to verify this theory .

From the experimental results , When the data set is larger , The more obvious the effect after pruning .
stay SVHN、CIFAR-10、ImageNet On several datasets ,ResNet The overall error rate is inversely proportional to the pruning scale of the dataset .
stay ImageNet You can see up here , Data set size Retain 80% Under the circumstances , The error rate under the training of the original data set is basically the same .
This curve also approximates Pareto optimality .
Next , Researchers focused on ImageNet On , Yes 10 Large scale benchmarking has been carried out in different cases .
It turns out that , Random pruning and some pruning indicators , stay ImageNet The performance is not good enough .

So go further , Researchers also proposed a self-monitoring method to prune data .
That is, knowledge distillation ( Teacher student model ), This is a common method of model compression .

Results show , Under the self-monitoring method , It's looking for data sets that are simple / The performance of difficult examples is good .

After pruning data with self-monitoring method , The accuracy is significantly improved ( chart C Medium light blue line ).

There are still some problems
But in the paper , Researchers also mentioned , Although the data set can be pruned without sacrificing performance through the above method , But some problems still deserve attention .
For example, after the data set is reduced , Want to train a model with the same performance , It may take longer .
therefore , When pruning data sets , We should balance the factors of reducing the scale and increasing the training time .
meanwhile , Prune the dataset , It is bound to lose some samples of groups , This may also cause the model to have drawbacks in a certain aspect .
In this respect, it will easily cause moral and ethical problems .
Research team
One of the authors of this article Surya Ganguli, Is a quantum neural network scientist .

before , During his undergraduate study at Stanford , At the same time, I learned computer science 、 Mathematics and Physics , After that, I got a master's degree in electrical engineering and computer science .
Address of thesis :
https://arxiv.org/abs/2206.14486
边栏推荐
- Cross process communication Aidl
- Officially launched! Tdengine plug-in enters the official website of grafana
- 移动端异构运算技术-GPU OpenCL编程(进阶篇)
- Oracle combines multiple rows of data into one row of data
- SQL learning group by multi table grouping scenario
- 单片机原理与接口技术(ESP8266/ESP32)机器人类草稿
- 写入速度提升数十倍,TDengine 在拓斯达智能工厂解决方案上的应用
- On July 2, I invite you to TD Hero online press conference
- 基于宽表的数据建模应用
- 如何获取GC(垃圾回收器)的STW(暂停)时间?
猜你喜欢

Tongweb set gzip

【技术直播】如何用 VSCode 从 0 到 1 改写 TDengine 代码

Uncover the practice of Baidu intelligent testing in the field of automatic test execution

Why does everyone want to do e-commerce? How much do you know about the advantages of online shopping malls?

如何获取GC(垃圾回收器)的STW(暂停)时间?

On July 2, I invite you to TD Hero online press conference

LeetCode 503. Next bigger Element II

First understanding of structure

Baidu app's continuous integration practice based on pipeline as code

解决idea调试过程中liquibase – Waiting for changelog lock….导致数据库死锁问题
随机推荐
Community group buying exploded overnight. How should this new model of e-commerce operate?
【OpenCV 例程200篇】219. 添加数字水印(盲水印)
【数组的中的某个属性的监听】
Fluent development: setting method of left and right alignment of child controls in row
oracle 多行数据合并成一行数据
cent7安装Oracle数据库报错
Observation cloud and tdengine have reached in-depth cooperation to optimize the cloud experience of enterprises
From "chemist" to developer, from Oracle to tdengine, two important choices in my life
Officially launched! Tdengine plug-in enters the official website of grafana
MySQL字符类型学习笔记
SQL learning alter add new field
正式上架!TDengine 插件入驻 Grafana 官网
Getting started with Apache dolphin scheduler (one article is enough)
SQL learning - case when then else
First understanding of structure
Cross process communication Aidl
Theme. AppCompat. Light. Darkactionbar not found
硬核,你见过机器人玩“密室逃脱”吗?(附代码)
uni-app---uni.navigateTo跳转传参使用
揭秘百度智能测试在测试自动执行领域实践