当前位置:网站首页>Cut off 20% of Imagenet data volume, and the performance of the model will not decline! Meta Stanford et al. Proposed a new method, using knowledge distillation to slim down the data set
Cut off 20% of Imagenet data volume, and the performance of the model will not decline! Meta Stanford et al. Proposed a new method, using knowledge distillation to slim down the data set
2022-07-05 09:59:00 【QbitAl】
bright and quick From the Aofei temple
qubits | official account QbitAI
These two days , The reward offered on twitter was a mess .
a AI The company offers 25 ten thousand dollar ( Renminbi equivalent 167 Ten thousand yuan ), Offer a reward for what task can make the model bigger 、 The worse the performance .
There has been a heated discussion in the comment area .
But it's not just a whole job , But to further explore the big model .
After all , In the past two years, everyone has become more and more aware of ,AI The model cannot simply compare “ Big ”.
One side , As the scale of the model grows , The cost of training began to increase exponentially ;
On the other hand , The improvement of model performance has gradually reached the bottleneck , Even if you want to reduce the error again 1%, Need more data set increments and calculation increments .
For example, for Transformer for , Cross entropy loss wants to start from 3.4 Knight lowered to 2.8 Knight , You need the original 10 times Amount of training data .
To address these issues ,AI Scholars have been looking for solutions in various directions .
Meta Stanford scholars , Recently, I thought of starting from Data sets Upper cut .
They put forward , Distill the data set , Make the data set small , But it can also keep the performance of the model from declining .
Experimental verification , Cutting ImageNet 20% After the amount of data ,ResNets There is little difference between the performance and the accuracy when using the original data .
The researchers say , This is also for AGI The realization has found a new way .
The efficiency of large data sets is not high
The method proposed in this paper , In fact, it is to optimize and simplify the original data set .
The researchers say , Many methods in the past have shown , Many training examples are highly redundant , In theory, the data set “ cut ” Smaller .
And recently, some studies have proposed some indicators , You can rank training examples according to their difficulty or importance , And by retaining some of these difficult examples , Data pruning can be completed .
Based on previous discoveries and research , This time, scholars further put forward some concrete methods .
First , They proposed a method of data analysis , The model can learn only part of the data , Can achieve the same performance .
Through data analysis , The researchers came to a preliminary conclusion :
How can a dataset be pruned best ? This is related to its own scale .
The more initial data , The more difficult examples should be kept ;
The less initial data , Then we should keep the examples with low difficulty .
After retaining the difficult examples for data pruning , The corresponding relationship between model and data scale , Can break the power-law distribution .
The often mentioned 28 law is based on the power law .
namely 20% Will affect 80% Result .
And in this case , We can also find an extreme value under Pareto optimality .
Pareto optimality here refers to an ideal state of resource allocation .
It assumes a fixed group of people and allocable resources , Adjust from one allocation state to another , Without making anyone worse , At least make one person better .
In this paper , Adjusting the allocation status can be understood as , How much proportion of data set to trim .
then , Researchers conducted experiments to verify this theory .
From the experimental results , When the data set is larger , The more obvious the effect after pruning .
stay SVHN、CIFAR-10、ImageNet On several datasets ,ResNet The overall error rate is inversely proportional to the pruning scale of the dataset .
stay ImageNet You can see up here , Data set size Retain 80% Under the circumstances , The error rate under the training of the original data set is basically the same .
This curve also approximates Pareto optimality .
Next , Researchers focused on ImageNet On , Yes 10 Large scale benchmarking has been carried out in different cases .
It turns out that , Random pruning and some pruning indicators , stay ImageNet The performance is not good enough .
So go further , Researchers also proposed a self-monitoring method to prune data .
That is, knowledge distillation ( Teacher student model ), This is a common method of model compression .
Results show , Under the self-monitoring method , It's looking for data sets that are simple / The performance of difficult examples is good .
After pruning data with self-monitoring method , The accuracy is significantly improved ( chart C Medium light blue line ).
There are still some problems
But in the paper , Researchers also mentioned , Although the data set can be pruned without sacrificing performance through the above method , But some problems still deserve attention .
For example, after the data set is reduced , Want to train a model with the same performance , It may take longer .
therefore , When pruning data sets , We should balance the factors of reducing the scale and increasing the training time .
meanwhile , Prune the dataset , It is bound to lose some samples of groups , This may also cause the model to have drawbacks in a certain aspect .
In this respect, it will easily cause moral and ethical problems .
Research team
One of the authors of this article Surya Ganguli, Is a quantum neural network scientist .
before , During his undergraduate study at Stanford , At the same time, I learned computer science 、 Mathematics and Physics , After that, I got a master's degree in electrical engineering and computer science .
Address of thesis :
https://arxiv.org/abs/2206.14486
边栏推荐
- Community group buying exploded overnight. How should this new model of e-commerce operate?
- 小程序启动性能优化实践
- SQL learning - case when then else
- Common fault analysis and Countermeasures of using MySQL in go language
- Flutter development: a way to solve the problem of blank space on the top of listview
- 写入速度提升数十倍,TDengine 在拓斯达智能工厂解决方案上的应用
- [two objects merged into one object]
- La voie de l'évolution du système intelligent d'inspection et d'ordonnancement des petites procédures de Baidu
- Small program startup performance optimization practice
- 百度智能小程序巡檢調度方案演進之路
猜你喜欢
Data visualization platform based on template configuration
正式上架!TDengine 插件入驻 Grafana 官网
Cross process communication Aidl
百度智能小程序巡檢調度方案演進之路
[sourcetree configure SSH and use]
How to empty uploaded attachments with components encapsulated by El upload
让AI替企业做复杂决策真的靠谱吗?参与直播,斯坦福博士来分享他的选择|量子位·视点...
百度APP 基于Pipeline as Code的持续集成实践
First understanding of structure
The writing speed is increased by dozens of times, and the application of tdengine in tostar intelligent factory solution
随机推荐
观测云与 TDengine 达成深度合作,优化企业上云体验
【对象数组a与对象数组b取出id不同元素赋值给新的数组】
Solve the problem of no all pattern found during Navicat activation and registration
解决Navicat激活、注册时候出现No All Pattern Found的问题
Evolution of Baidu intelligent applet patrol scheduling scheme
uni-app---uni. Navigateto jump parameter use
How do enterprises choose the appropriate three-level distribution system?
百度评论中台的设计与探索
[two objects merged into one object]
How to use sqlcipher tool to decrypt encrypted database under Windows system
On July 2, I invite you to TD Hero online press conference
How to improve the operation efficiency of intra city distribution
Node の MongoDB Driver
Lepton 无损压缩原理及性能分析
[listening for an attribute in the array]
Dry goods sorting! How about the development trend of ERP in the manufacturing industry? It's enough to read this article
Are databases more popular as they get older?
TDengine × Intel edge insight software package accelerates the digital transformation of traditional industries
Fluent development: setting method of left and right alignment of child controls in row
TDengine可通过数据同步工具 DataX读写