当前位置:网站首页>Cut off 20% of Imagenet data volume, and the performance of the model will not decline! Meta Stanford et al. Proposed a new method, using knowledge distillation to slim down the data set
Cut off 20% of Imagenet data volume, and the performance of the model will not decline! Meta Stanford et al. Proposed a new method, using knowledge distillation to slim down the data set
2022-07-05 09:59:00 【QbitAl】
bright and quick From the Aofei temple
qubits | official account QbitAI
These two days , The reward offered on twitter was a mess .
a AI The company offers 25 ten thousand dollar ( Renminbi equivalent 167 Ten thousand yuan ), Offer a reward for what task can make the model bigger 、 The worse the performance .
There has been a heated discussion in the comment area .
But it's not just a whole job , But to further explore the big model .
After all , In the past two years, everyone has become more and more aware of ,AI The model cannot simply compare “ Big ”.
One side , As the scale of the model grows , The cost of training began to increase exponentially ;
On the other hand , The improvement of model performance has gradually reached the bottleneck , Even if you want to reduce the error again 1%, Need more data set increments and calculation increments .
For example, for Transformer for , Cross entropy loss wants to start from 3.4 Knight lowered to 2.8 Knight , You need the original 10 times Amount of training data .
To address these issues ,AI Scholars have been looking for solutions in various directions .
Meta Stanford scholars , Recently, I thought of starting from Data sets Upper cut .
They put forward , Distill the data set , Make the data set small , But it can also keep the performance of the model from declining .
Experimental verification , Cutting ImageNet 20% After the amount of data ,ResNets There is little difference between the performance and the accuracy when using the original data .
The researchers say , This is also for AGI The realization has found a new way .
The efficiency of large data sets is not high
The method proposed in this paper , In fact, it is to optimize and simplify the original data set .
The researchers say , Many methods in the past have shown , Many training examples are highly redundant , In theory, the data set “ cut ” Smaller .
And recently, some studies have proposed some indicators , You can rank training examples according to their difficulty or importance , And by retaining some of these difficult examples , Data pruning can be completed .
Based on previous discoveries and research , This time, scholars further put forward some concrete methods .
First , They proposed a method of data analysis , The model can learn only part of the data , Can achieve the same performance .
Through data analysis , The researchers came to a preliminary conclusion :
How can a dataset be pruned best ? This is related to its own scale .
The more initial data , The more difficult examples should be kept ;
The less initial data , Then we should keep the examples with low difficulty .
After retaining the difficult examples for data pruning , The corresponding relationship between model and data scale , Can break the power-law distribution .
The often mentioned 28 law is based on the power law .
namely 20% Will affect 80% Result .
And in this case , We can also find an extreme value under Pareto optimality .
Pareto optimality here refers to an ideal state of resource allocation .
It assumes a fixed group of people and allocable resources , Adjust from one allocation state to another , Without making anyone worse , At least make one person better .
In this paper , Adjusting the allocation status can be understood as , How much proportion of data set to trim .
then , Researchers conducted experiments to verify this theory .
From the experimental results , When the data set is larger , The more obvious the effect after pruning .
stay SVHN、CIFAR-10、ImageNet On several datasets ,ResNet The overall error rate is inversely proportional to the pruning scale of the dataset .
stay ImageNet You can see up here , Data set size Retain 80% Under the circumstances , The error rate under the training of the original data set is basically the same .
This curve also approximates Pareto optimality .
Next , Researchers focused on ImageNet On , Yes 10 Large scale benchmarking has been carried out in different cases .
It turns out that , Random pruning and some pruning indicators , stay ImageNet The performance is not good enough .
So go further , Researchers also proposed a self-monitoring method to prune data .
That is, knowledge distillation ( Teacher student model ), This is a common method of model compression .
Results show , Under the self-monitoring method , It's looking for data sets that are simple / The performance of difficult examples is good .
After pruning data with self-monitoring method , The accuracy is significantly improved ( chart C Medium light blue line ).
There are still some problems
But in the paper , Researchers also mentioned , Although the data set can be pruned without sacrificing performance through the above method , But some problems still deserve attention .
For example, after the data set is reduced , Want to train a model with the same performance , It may take longer .
therefore , When pruning data sets , We should balance the factors of reducing the scale and increasing the training time .
meanwhile , Prune the dataset , It is bound to lose some samples of groups , This may also cause the model to have drawbacks in a certain aspect .
In this respect, it will easily cause moral and ethical problems .
Research team
One of the authors of this article Surya Ganguli, Is a quantum neural network scientist .
before , During his undergraduate study at Stanford , At the same time, I learned computer science 、 Mathematics and Physics , After that, I got a master's degree in electrical engineering and computer science .
Address of thesis :
https://arxiv.org/abs/2206.14486
边栏推荐
- Roll up, break 35 - year - old Anxiety, animation Demonstration CPU recording Function call Process
- H. 265 introduction to coding principles
- Generics, generic defects and application scenarios that 90% of people don't understand
- Apache DolphinScheduler 入门(一篇就够了)
- Oracle combines multiple rows of data into one row of data
- QT event filter simple case
- Kotlin Compose 多个条目滚动
- TDengine 连接器上线 Google Data Studio 应用商店
- From "chemist" to developer, from Oracle to tdengine, two important choices in my life
- Community group buying exploded overnight. How should this new model of e-commerce operate?
猜你喜欢
How to use sqlcipher tool to decrypt encrypted database under Windows system
90%的人都不懂的泛型,泛型的缺陷和应用场景
LeetCode 556. Next bigger element III
Community group buying exploded overnight. How should this new model of e-commerce operate?
如何正确的评测视频画质
How to choose the right chain management software?
Dry goods sorting! How about the development trend of ERP in the manufacturing industry? It's enough to read this article
Kotlin Compose 多个条目滚动
.Net之延迟队列
What should we pay attention to when developing B2C websites?
随机推荐
Apache DolphinScheduler 系统架构设计
[app packaging error] to proceed, either fix the issues identified by lint, or modify your build script as follow
Resolve the horizontal (vertical) sliding conflict between viewpager and WebView
【el-table如何禁用】
【饿了么动态表格】
Oracle combines multiple rows of data into one row of data
The writing speed is increased by dozens of times, and the application of tdengine in tostar intelligent factory solution
(1) Complete the new construction of station in Niagara vykon N4 supervisor 4.8 software
Analysis on the wallet system architecture of Baidu trading platform
如何获取GC(垃圾回收器)的STW(暂停)时间?
Design and exploration of Baidu comment Center
TDengine可通过数据同步工具 DataX读写
Tutorial on building a framework for middle office business system
The essence of persuasion is to remove obstacles
Optimize database queries using the cursor object of SQLite
[NTIRE 2022]Residual Local Feature Network for Efficient Super-Resolution
Go 语言使用 MySQL 的常见故障分析和应对方法
TDengine × Intel edge insight software package accelerates the digital transformation of traditional industries
Android SQLite database encryption
Solve the problem of no all pattern found during Navicat activation and registration