当前位置:网站首页>[paper reading] mean teachers are better role models
[paper reading] mean teachers are better role models
2022-07-24 13:08:00 【The next day is expected 1314】
1. Abstract
Recently proposed Temporal Ensembling The most advanced results have been achieved in several semi supervised learning benchmarks . It maintains tag predictions for each training example EMA, And punish predictions that are inconsistent with this goal . However , Because every goal epoch Only change once , So when learning large data sets ,Temporal Ensembling Become clumsy . To overcome this problem , We proposed Mean Teacher, This is a kind of Average model weight Rather than label prediction . As an additional benefit ,Mean Teacher Improve the accuracy of the test , And you can use ratio Temporal Ensembling Less tags for training .
2. Pre knowledge
While reading the abstract of the paper , At the same time, it is also accompanied by ignorance , The reason is simply the lack of pre knowledge in this field . This section mainly introduces the pre knowledge I supplemented in the process of understanding the paper .
2.1. Temporal Ensembling
We can translate it into Time integration , The first sentence in the abstract is a tribute to the discovery of our predecessors , So this is state of art Proposed . Want to know the knowledge points in this paper , You can finish reading the previous Blog .
2.2. EMA
EMA(exponential moving average), Also called exponential moving average , It is a type of average commonly used in time series analysis . Simply speaking ,EMA Is a weighted average . among , An important feature of it is that with the passage of time , Old observations will show exponential decay . equation 1, It means EMA The recurrence formula of , Details can be found in Blog .
S t = { S 0 , t = 1 ( 1 − α ) S t − 1 + α X t , t ≥ 2 (1) S_t = \begin{cases} S_0,& t=1 \\ (1-\alpha)S_{t-1}+\alpha X_t,& t \geq2 \\ \end{cases}\tag{1} St={ S0,(1−α)St−1+αXt,t=1t≥2(1)
3. Algorithm description

The algorithm innovation proposed in this paper is based on the previous article The paper Of , The main change is the cost of consistency , This can be understood as unsupervised loss . The algorithm in this paper is essentially to maintain two models ,Teacher and Student, The result of two norm operation with the same input through the output of two models is regarded as unsupervised loss . Π \Pi Π model In essence, only one model is maintained , Just there will be Dropout. T e m p o r a l E n s e m b l i n g Temporal Ensembling TemporalEnsembling Only one model is maintained , Just put the model in each epoch The output of EMA. This article is more direct , Maintain two models directly , Make the parameters of the two models EMA, Macroscopically, it can be seen as a model that imparts its own experience to another model , It is described in the paper as Mean Teacher .
边栏推荐
- [datasheet phy] interpretation of ksz8081 data manual
- 2022.07.15 summer training personal qualifying (10)
- Static attribute, super()
- iSCSI新应用,以及NFS的存储服务分离
- EAS environment structure directory
- Step of product switching to domestic chips, stm32f4 switching to gd32
- Symbol
- 27. Longest increasing subsequence
- Raspberry pie self built NAS cloud disk -- automatic data backup
- 26. Reverse linked list II
猜你喜欢
随机推荐
24. Merge K ascending linked lists
mysql select延迟的场景对应的是所有数据库查询语句都会延迟吧,我这边场景注入后,执行了一条
Teach you how to use power Bi to realize four kinds of visual charts
31. Climb stairs
Leetcode's 302 weekly rematch
setAttribute、getAttribute、removeAttribute
23. Spiral matrix
权限系统就该这么设计,yyds
猿人学第七题
登临科技联合创始人王平:创新+自研“双核”驱动,GPU+赋能AI落地生根|量子位·视点分享回顾...
基于matlab的语音处理
2022.07.15 暑假集训 个人排位赛(十)
2022.07.21
July training (day 24) - segment tree
34. Add two numbers
Solutions to problems in IE6 browser
Custom scroll bar
1.9. 触摸按钮(touch pad)测试
sql的where+or的用法丢失条件
Digital intelligence integration accelerates enterprise business innovation







![[datasheet phy] interpretation of ksz8081 data manual](/img/14/cca728ebb9baea9d937b82bfb11725.png)