当前位置:网站首页>Surpass the strongest variant of RESNET! Google proposes a new convolution + attention network: coatnet, with an accuracy of 89.77%!

Surpass the strongest variant of RESNET! Google proposes a new convolution + attention network: coatnet, with an accuracy of 89.77%!

2022-06-13 02:32:00 Prodigal son's private dishes

The paper :https://arxiv.org/abs/2106.04803

Transformer Although cross-border computer vision has made some good achievements , But most of the time , It still lags behind the most advanced convolution network .

Now? , Google has come up with an idea called CoAtNets Model of , Look at the name, you also found , This is a Convolution + Attention The combination model of .

The model implements ImageNet Data sets 86.0% Of top-1 precision , But in the use of JFT In the case of data set 89.77% The accuracy of the , The performance is better than all existing convolutional networks and Transformer!

Convolution combined self attention , Stronger generalization ability and higher model capacity

How do they decide to combine convolution networks with Transformer Combine them to make a new model ?

First , The researchers found that , Convolutional networks and Transformer In two basic aspects of machine learning —— Generalization and model capacity have their own advantages .

Because the convolution layer has a strong inductive bias (inductive bias), So convolution network model has better generalization ability and faster convergence speed , And those who have the attention mechanism Transformer There is a higher model capacity , Can benefit from the big data set .

That combines the convolution layer and the attention layer , You can get better generalization ability and larger model capacity at the same time !

Good. , Here comes the key question : How to effectively combine them , And achieve a better balance between accuracy and efficiency ?

原网站

版权声明
本文为[Prodigal son's private dishes]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202280540496980.html