当前位置:网站首页>Nasvit: neural architecture search of efficient visual converter with gradient conflict perception hypernetwork training
Nasvit: neural architecture search of efficient visual converter with gradient conflict perception hypernetwork training
2022-07-03 03:00:00 【Zhiyuan community】
Paper title :NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet Training
Thesis link :https://openreview.net/forum?id=Qaw16njk6L
Design accurate and efficient vision Transformer (ViT) Is a very important but challenging task . One time neural architecture search based on HYPERNET (NAS) It can realize rapid architecture optimization , And in convolution neural network (CNN) Has made the most advanced (SOTA) result . However , Direct application of HYPERNET based NAS To optimize ViT Can lead to poor performance - And training ViT Worse than . In this work , We observed that the poor performance is due to the gradient conflict problem :ViTs The gradient conflict ratio of different subnetworks and hypernetworks in CNN More serious , This leads to premature saturation and poor convergence of training . To alleviate the problem , We propose a series of technologies , Including gradient projection algorithm 、 Switchable layer scaling design and simplified data enhancement and regularization training scheme . The proposed technique significantly improves the convergence and performance of all subnetworks . The mixture we found ViT Model series , be called NASViT, stay ImageNet From 200M To 800M FLOPs Of top-1 The accuracy is from 78.2% To 81.8%, And superior to all existing technologies CNN and ViT, Include AlphaNet and LeViT etc. . When transferring to semantics in segmentation tasks ,NASViT stay Cityscape and ADE20K The performance of data sets is also better than that of previous Backbone Networks , Only in 5GFLOPs We have achieved 73.2% and 37.9% Of mIoU.

边栏推荐
- Kubernetes cluster log and efk architecture log scheme
- Random shuffle note
- 超好用的日志库 logzero
- 复选框的使用:全选,全不选,选一部分
- xiaodi-笔记
- Le processus de connexion mysql avec docker
- Tensorflow to pytorch notes; tf. gather_ Nd (x, y) to pytorch
- ASP. Net core 6 framework unveiling example demonstration [02]: application development based on routing, MVC and grpc
- 【Flutter】shared_ Preferences local storage (introduction | install the shared_preferences plug-in | use the shared_preferences process)
- Can netstat still play like this?
猜你喜欢

HTB-Devel

I2C 子系统(二):I3C spec

一文带你了解 ZigBee

Deep learning: multi-layer perceptron and XOR problem (pytoch Implementation)

用docker 连接mysql的过程

迅雷chrome扩展插件造成服务器返回的数据js解析页面数据异常

I2C subsystem (I): I2C spec

Check log4j problems using stain analysis

Classes and objects - initialization and cleanup of objects - constructor call rules

Segmentation fault occurs during VFORK execution
随机推荐
Cancer biopsy instruments and kits - market status and future development trends
[Fuhan 6630 encodes and stores videos, and uses RTSP server and timestamp synchronization to realize VLC viewing videos]
后管中编辑与预览获取表单的值写法
Introduction to cron expression
Add MDF database file to SQL Server database, and the error is reported
SqlServer行转列PIVOT
模糊查詢時報錯Parameter index out of range (1 > number of parameters, which is 0)
I2C 子系统(一):I2C spec
Concrete CMS vulnerability
sql server 查询指定表的表结构
Baidu map - surrounding search
Kubernetes family container housekeeper pod online Q & A?
左值右指解释的比较好的
[C language] MD5 encryption for account password
敏捷认证(Professional Scrum Master)模拟练习题
内存泄漏工具VLD安装及使用
random shuffle注意
用docker 连接mysql的过程
当lambda没有输入时,是何含义?
左连接,内连接