当前位置:网站首页>Nasvit: neural architecture search of efficient visual converter with gradient conflict perception hypernetwork training
Nasvit: neural architecture search of efficient visual converter with gradient conflict perception hypernetwork training
2022-07-03 03:00:00 【Zhiyuan community】
Paper title :NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet Training
Thesis link :https://openreview.net/forum?id=Qaw16njk6L
Design accurate and efficient vision Transformer (ViT) Is a very important but challenging task . One time neural architecture search based on HYPERNET (NAS) It can realize rapid architecture optimization , And in convolution neural network (CNN) Has made the most advanced (SOTA) result . However , Direct application of HYPERNET based NAS To optimize ViT Can lead to poor performance - And training ViT Worse than . In this work , We observed that the poor performance is due to the gradient conflict problem :ViTs The gradient conflict ratio of different subnetworks and hypernetworks in CNN More serious , This leads to premature saturation and poor convergence of training . To alleviate the problem , We propose a series of technologies , Including gradient projection algorithm 、 Switchable layer scaling design and simplified data enhancement and regularization training scheme . The proposed technique significantly improves the convergence and performance of all subnetworks . The mixture we found ViT Model series , be called NASViT, stay ImageNet From 200M To 800M FLOPs Of top-1 The accuracy is from 78.2% To 81.8%, And superior to all existing technologies CNN and ViT, Include AlphaNet and LeViT etc. . When transferring to semantics in segmentation tasks ,NASViT stay Cityscape and ADE20K The performance of data sets is also better than that of previous Backbone Networks , Only in 5GFLOPs We have achieved 73.2% and 37.9% Of mIoU.
边栏推荐
- Le processus de connexion mysql avec docker
- Joking about Domain Driven Design (III) -- Dilemma
- Deep learning: multi-layer perceptron and XOR problem (pytoch Implementation)
- A2L file parsing based on CAN bus (2)
- Global and Chinese ammonium dimolybdate market in-depth analysis and prospect risk prediction report 2022 Edition
- Kubernetes family container housekeeper pod online Q & A?
- [flutter] example of asynchronous programming code between future and futurebuilder (futurebuilder constructor setting | handling flutter Chinese garbled | complete code example)
- 敏捷认证(Professional Scrum Master)模拟练习题-2
- [fluent] future asynchronous programming (introduction | then method | exception capture | async, await keywords | whencomplete method | timeout method)
- Can I use read-only to automatically implement properties- Is read-only auto-implemented property possible?
猜你喜欢
Your family must be very poor if you fight like this!
Random Shuffle attention
TCP 三次握手和四次挥手机制,TCP为什么要三次握手和四次挥手,TCP 连接建立失败处理机制
Le processus de connexion mysql avec docker
sql server数据库添加 mdf数据库文件,遇到的报错
I2C subsystem (I): I2C spec
Pytest (6) -fixture (Firmware)
Three.js本地环境搭建
Baidu map - surrounding search
xiaodi-笔记
随机推荐
Deep reinforcement learning for intelligent transportation systems: a survey paper reading notes
[fluent] JSON model conversion (JSON serialization tool | JSON manual serialization | writing dart model classes according to JSON | online automatic conversion of dart classes according to JSON)
Use of check boxes: select all, deselect all, and select some
疫情当头,作为Leader如何进行代码版本和需求开发管控?| 社区征文
I2C subsystem (I): I2C spec
Kubernetes cluster log and efk architecture log scheme
How to limit the size of the dictionary- How to limit the size of a dictionary?
内存泄漏工具VLD安装及使用
Docker install redis
Baidu map - surrounding search
SQL Server Query spécifie la structure de la table
How to use asp Net MVC identity 2 change password authentication- How To Change Password Validation in ASP. Net MVC Identity 2?
Deep learning: multi-layer perceptron and XOR problem (pytoch Implementation)
Creation and destruction of function stack frame
ComponentScan和ComponentScans的区别
基于can总线的A2L文件解析(2)
The difference between componentscan and componentscans
函数栈帧的创建与销毁
你真的懂继电器吗?
JS finds all the parent nodes or child nodes under a node according to the tree structure