当前位置:网站首页>Nasvit: neural architecture search of efficient visual converter with gradient conflict perception hypernetwork training
Nasvit: neural architecture search of efficient visual converter with gradient conflict perception hypernetwork training
2022-07-03 03:00:00 【Zhiyuan community】
Paper title :NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet Training
Thesis link :https://openreview.net/forum?id=Qaw16njk6L
Design accurate and efficient vision Transformer (ViT) Is a very important but challenging task . One time neural architecture search based on HYPERNET (NAS) It can realize rapid architecture optimization , And in convolution neural network (CNN) Has made the most advanced (SOTA) result . However , Direct application of HYPERNET based NAS To optimize ViT Can lead to poor performance - And training ViT Worse than . In this work , We observed that the poor performance is due to the gradient conflict problem :ViTs The gradient conflict ratio of different subnetworks and hypernetworks in CNN More serious , This leads to premature saturation and poor convergence of training . To alleviate the problem , We propose a series of technologies , Including gradient projection algorithm 、 Switchable layer scaling design and simplified data enhancement and regularization training scheme . The proposed technique significantly improves the convergence and performance of all subnetworks . The mixture we found ViT Model series , be called NASViT, stay ImageNet From 200M To 800M FLOPs Of top-1 The accuracy is from 78.2% To 81.8%, And superior to all existing technologies CNN and ViT, Include AlphaNet and LeViT etc. . When transferring to semantics in segmentation tasks ,NASViT stay Cityscape and ADE20K The performance of data sets is also better than that of previous Backbone Networks , Only in 5GFLOPs We have achieved 73.2% and 37.9% Of mIoU.

边栏推荐
- Getting started | jetpack hilt dependency injection framework
- SqlServer行转列PIVOT
- sql server 查询指定表的表结构
- random shuffle注意
- I2C subsystem (III): I2C driver
- A2L file parsing based on CAN bus (2)
- Super easy to use logzero
- Cron表达式介绍
- The base value is too large (the error is marked as "08") [duplicate] - value too great for base (error token is'08') [duplicate]
- Are there any recommended term life insurance products? I want to buy a term life insurance.
猜你喜欢

敏捷认证(Professional Scrum Master)模拟练习题-2

Force deduction ----- the minimum path cost in the grid

分布式事务

Deep learning: multi-layer perceptron and XOR problem (pytoch Implementation)

Privatization lightweight continuous integration deployment scheme -- 01 environment configuration (Part 2)

Sqlserver row to column pivot

Check log4j problems using stain analysis

Do you really understand relays?

C语言初阶-指针详解-庖丁解牛篇

从C到Capable-----利用指针作为函数参数求字符串是否为回文字符
随机推荐
左值右指解释的比较好的
How to select the minimum and maximum values of columns in the data table- How to select min and max values of a column in a datatable?
Check log4j problems using stain analysis
I2C 子系统(二):I3C spec
二维格式数组格式索引下标连续问题导致 返回json 格式问题
Xiaodi notes
docker安装mysql
Add automatic model generation function to hade
Pytest (6) -fixture (Firmware)
Joking about Domain Driven Design (III) -- Dilemma
[flutter] example of asynchronous programming code between future and futurebuilder (futurebuilder constructor setting | handling flutter Chinese garbled | complete code example)
Force freeing memory in PHP
Check log4j problems using stain analysis
TCP handshake three times and wave four times. Why does TCP need handshake three times and wave four times? TCP connection establishes a failure processing mechanism
[shutter] monitor the transparency gradient of the scrolling action control component (remove the blank of the top status bar | frame layout component | transparency component | monitor the scrolling
模糊查询时报错Parameter index out of range (1 > number of parameters, which is 0)
JMeter performance test JDBC request (query database to obtain database data) use "suggestions collection"
C语言中左值和右值的区别
Cron表达式介绍
The solution of "the required function is not supported" in win10 remote desktop connection is to modify the Registry [easy to understand]