当前位置:网站首页>Surpass the strongest variant of RESNET! Google proposes a new convolution + attention network: coatnet, with an accuracy of 89.77%!
Surpass the strongest variant of RESNET! Google proposes a new convolution + attention network: coatnet, with an accuracy of 89.77%!
2022-06-13 02:32:00 【Prodigal son's private dishes】
The paper :https://arxiv.org/abs/2106.04803
Transformer Although cross-border computer vision has made some good achievements , But most of the time , It still lags behind the most advanced convolution network .
Now? , Google has come up with an idea called CoAtNets Model of , Look at the name, you also found , This is a Convolution + Attention The combination model of .
The model implements ImageNet Data sets 86.0% Of top-1 precision , But in the use of JFT In the case of data set 89.77% The accuracy of the , The performance is better than all existing convolutional networks and Transformer!
Convolution combined self attention , Stronger generalization ability and higher model capacity
How do they decide to combine convolution networks with Transformer Combine them to make a new model ?
First , The researchers found that , Convolutional networks and Transformer In two basic aspects of machine learning —— Generalization and model capacity have their own advantages .
Because the convolution layer has a strong inductive bias (inductive bias), So convolution network model has better generalization ability and faster convergence speed , And those who have the attention mechanism Transformer There is a higher model capacity , Can benefit from the big data set .
That combines the convolution layer and the attention layer , You can get better generalization ability and larger model capacity at the same time !
Good. , Here comes the key question : How to effectively combine them , And achieve a better balance between accuracy and efficiency ?
边栏推荐
- [open source] libinimini: a minimalist ini parsing library for single chip computers
- After idea uses c3p0 connection pool to connect to SQL database, database content cannot be displayed
- ROS learning -5 how function packs with the same name work (workspace coverage)
- Leetcode 926. Flip string to monotonically increasing [prefix and]
- Introduction to easydl object detection port
- [reading papers] dcgan, the combination of generating countermeasure network and deep convolution
- Mbedtls migration experience
- json,xml,txt
- How to learn to understand Matplotlib instead of simple code reuse
- Paper reading - beat tracking by dynamic programming
猜你喜欢
Understand HMM
Is space time attention all you need for video understanding?
[reading some papers] introducing deep learning into the public horizon alexnet
Opencvsharp4 handwriting recognition
Leetcode 926. Flip string to monotonically increasing [prefix and]
Paper reading - joint beat and downbeat tracking with recurrent neural networks
regular expression
[reading paper] generate confrontation network Gan
Armv8-m (Cortex-M) TrustZone summary and introduction
ROS learning -5 how function packs with the same name work (workspace coverage)
随机推荐
Exam23 named windows and simplified paths, grayscale conversion
Record: how to solve the problem of "the system cannot find the specified path" in the picture message uploaded by transferto() of multipartfile class [valid through personal test]
1000粉丝啦~
Paipai loan parent company Xinye quarterly report diagram: revenue of RMB 2.4 billion, net profit of RMB 530million, a year-on-year decrease of 10%
Branch and bound method, example sorting
在IDEA使用C3P0連接池連接SQL數據庫後卻不能顯示數據庫內容
OpenCVSharpSample04WinForms
Several articles on norms
Basic exercises of test questions Fibonacci series
CCF 201409-1: adjacent number pairs (100 points + problem solving ideas)
I didn't expect that the index occupies several times as much space as the data MySQL queries the space occupied by each table in the database, and the space occupied by data and indexes. It is used i
Opencvsharp4 handwriting recognition
AutoX. JS invitation code
Mbedtls migration experience
Share three stories about CMDB
speech production model
Linear, integer, nonlinear, dynamic programming
Leetcode 926. Flip string to monotonically increasing [prefix and]
Think: when do I need to disable mmu/i-cache/d-cache?
Leetcode daily question - 890 Find and replace mode