当前位置:网站首页>Beyond convnext, replknet | look 51 × 51 convolution kernel how to break ten thousand volumes!
Beyond convnext, replknet | look 51 × 51 convolution kernel how to break ten thousand volumes!
2022-07-25 17:11:00 【Tom Hardy】
Click on the above “3D Visual workshop ”, choice “ Star standard ”
The dry goods arrive at the first time

The author 丨 ChaucerG
Source: Jizhi bookboy

since
Vision Transformers(ViT) Since its appearance ,TransformersQuickly shine in the field of computer vision . Convolutional neural networks (CNN) The leading role of seems to be more and more effectively based onTransformerThe challenge of the model . lately , Some advanced convolution models are designed using the mechanism driven by local large attentionKernelConvolution module for counterattack , And show attractive performance and efficiency . One of them , namelyRepLKNet, With improved performanceKernel-sizeExtended to31×31, But withSwin TransformerWait for advancedViTCompared with the expansion trend of , WithKernel-sizeThe continuous growth of , Performance starts to saturate .
In this paper , The author explored that training is greater than 31×31 The possibility of extreme convolution , And test whether the performance gap can be eliminated by strategically expanding convolution . This research finally got an application from the perspective of sparsity kernel Methods , It can smooth kernel Extended to 61×61, And has better performance . Based on this method , The author puts forward Sparse Large Kernel Network(SLaK), This is a kind of equipment 51×51 kernel-size Pure CNN framework , Its performance can be compared with the most advanced layering Transformer And the modern ConvNet framework ( Such as ConvNeXt and RepLKNet, About ImageNet Classification and typical downstream tasks .
1 Application exceeds 31×31 Super convolution kernel
The author first studies greater than 31×31 The extremes of Kernel-size Performance of , And summarized 3 There are three main observations . Here the author ImageNet-1K Recently developed on CNN framework ConvNeXt As the subject of this study benchmark.
The author pays attention to the recent use Mixup、Cutmix、RandAugment and Random Erasing As a work of data enhancement . Random depth and Label smoothing Apply for regularization , Have and ConvNeXt The same super parameter in . use AdamW Training models . In this section , All models are aimed at 120 individual epoch The length of , To observe only the big Kernel-size Zoom trend .
Observe 1: Existing techniques cannot extend convolution beyond 31×31
lately ,RepLKNet The convolution is successfully extended to 31×31. The author further Kernel-size Add to 51×51 and 61×61, Look at the bigger kernel Whether it can bring more benefits . according to RepLKNet In the design , Put the... Of each stage in turn Kernel-size Set to [51,49,47,13] and [61,59,57,13].

The test accuracy is shown in table 1 Shown . As expected , take Kernel-size from 7×7 Add to 31×31 Will significantly reduce performance , and RepLKNet This problem can be overcome , Improve accuracy 0.5%. However , This trend does not apply to larger kernel, Because the Kernel-size Add to 51×51 Start damaging performance .
A reasonable explanation is , Although receptive field can be used by very large
kernel, Such as51×51and61×61To expand the receptive field , But it may not maintain some desirable characteristics , Such as locality . Because of the standardResNetandConvNeXtMediumstem cellResulting in the input image 4× Downsampling , have51×51The extreme nuclei of have been roughly equal to the typical224×224ImageNetGlobal convolution of . therefore , This observation is meaningful , Because inViTsIn a similar mechanism ,Local attentionUsually better thanGlobal attention. On this basis , The opportunity to solve this problem by introducing locality , At the same time, it retains the ability to capture global relationships .
Observe 2: Put a square big kernel Decompose into 2 A rectangular one parallel kernels, Can be Kernel-size Shrink smoothly into 61.

Although using a medium-sized convolution ( for example ,31×31) It seems that this problem can be avoided directly , But the author just wants to see if it can be used ( overall situation ) Extreme convolution to further promote cnn Performance of . The method used by the author here is to use 2 A combination of parallel and rectangular convolutions to approximate large M×M kernel, Their Kernel-size Respectively M×N and N×M( among N<M), Pictured 1 Shown . stay RepLKNet after , Keep one 5×5 Layer and large kernel parallel , And summarize their output after a batch norm layer .

This decomposition not only inherits big kernel The ability to capture remote dependencies , And it can extract local context features with short edges . what's more , With depth Kernel-size An increase in , Existing large kernel Training technology will be affected by secondary calculation and memory overhead .

In sharp contrast to it is , The cost of this method increases with Kernel-size Linear increase ( chart 4).N = 5 Of kernel The decomposition performance is shown in table 2 It is reported as “ decompose ” Group . Due to the reduction of decomposition FLOP, With medium kernel The structure of is re parameterized (RepLKNet) comparison , It is expected that the network will sacrifice some accuracy , namely 31×31. However , As the convolution size increases to global convolution , It can amazingly Kernel-size Extended to 61 And improve performance .
Observe 3:“use sparse groups, expand more” Significantly improve the capacity of the model
Recently proposed ConvNeXt Revisited ResNeXt The principle introduced in , This principle divides the convolution filter into smaller but more groups .ConvNeXt Standard group convolution is not used , Instead, we simply use depth convolution to increase the width “use more groups, expand width” The goal of . In this paper , The author attempts to extend this principle from another alternative perspective ——“use sparse groups, expand more”.
say concretely , First use Sparse convolution Instead of Dense convolution , among Sparse kernel Is based on SNIP The hierarchical sparseness ratio of random construction . After building , Training sparse models with dynamic sparsity , Among them, the sparse weight is pruned by the weight of the minimum amplitude in the training process , Randomly increase the same number of weights for dynamic adjustment . Doing so can dynamically adapt to sparse weights , So as to obtain better local features .
because kernel It is sparse during the whole training process , Corresponding parameter counting and training / The inference flow is only proportional to the dense model . To evaluate , With 40% After sparse decomposition kernel, And report its performance as “ Sparse decomposition ” Group . It can be in the table 2 In the middle column of , Dynamic sparsity is significantly reduced FLOPs exceed 2.0G, This leads to temporary performance degradation .

Next , The author proves that the efficiency of the above dynamic sparsity can be effectively transferred to the scalability of the model . Dynamic sparsity allows the scale of the model to be friendly . for example , Use the same sparsity (40%), You can expand the width of the model 1.3×, Keep the parameter count and FLOPs It is roughly the same as the dense model . This leads to significant performance improvements , In the extreme 51×51kernel Next , Performance from 81.3% Up to 81.6%. What's impressive is that , This method is equipped with 61×61 kernel , The performance exceeds the previous RepLKNet, At the same time, it saves 55% Of FLOPs.
2Sparse Large Kernel Network - SLaK
up to now , It has been found that the method in this paper can successfully integrate Kernel-size Extended to 61, There is no need for reverse trigger performance . It includes 2 A design inspired by sparsity .
At the macro level , Built a Essentially sparse network , And further expand the network , In order to improve the network capacity while maintaining the scale of similar models .
At the micro level , Put a dense big kernel Decompose into 2 Two complements with dynamic sparsity kernel, To improve big kernel extensibility .
Different from the traditional pruning after training , A network that trains directly from scratch , Without any pre training or fine tuning . On this basis, we put forward Sparse Large Kernel Network(SLaK), This is a pure CNN framework , Extreme 51×51kernel.
SLaK Is based on ConvNeXt The architecture of . Stage calculation ratio and stem cell design are inherited from ConvNeXt. The number of blocks in each stage is for SLaK-T by [3, 3, 9, 3], about SLaK-S/B by [3, 3, 27, 3].stem cell It's just one with kernel-size by 4×4 and stride=4 The convolution of layer . The author will ConvNeXt Stage Kernel-size Increased to [51, 49, 47, 13], And will each M×M kernel Replace with M×5 and 5×M kernel The combination of , Pictured 1 Shown . The author found that adding before summing the output , Directly in each decomposed kernel After that BatchNorm Layer is crucial .
follow “use sparse groups, expand more” Guidelines for , Further sparse the entire network , Expand the stage width 1.3 times , The resulting SLaK-T/S/B. Although we know that we can improve by adjusting the trade-off between model width and sparsity SLaK There is a lot of room for performance , But for the sake of simplicity , Keep the width of all models 1.3 times . The sparsity of all models is set to 40%.
Although the model is configured with extreme 51×51kernel, But the overall parameter count and FLOP Not too much , And because of RepLKNet Excellent implementation provided , It is very effective in practice .
3 experiment
3.1 Classified experiments

3.2 Semantic segmentation

4 Reference resources
[1].More ConvNets in the 2020s : Scaling up Kernels Beyond 51 × 51 using Sparsity.
This article is only for academic sharing , If there is any infringement , Please contact to delete .
Dry goods download and learning
The background to reply : Barcelo that Autonomous University courseware , You can download the precipitation of foreign universities for several years 3D Vison High quality courseware
The background to reply : Computer vision Books , You can download it. 3D Classic books in the field of vision pdf
The background to reply :3D Visual courses , You can learn 3D Excellent courses in the field of vision
Computer vision workshop boutique course official website :3dcver.com
1. Multi sensor data fusion technology for automatic driving field
2. For the field of automatic driving 3D Whole stack learning route of point cloud target detection !( Single mode + Multimodal / data + Code )
3. Thoroughly understand the visual three-dimensional reconstruction : Principle analysis 、 Code explanation 、 Optimization and improvement
4. China's first point cloud processing course for industrial practice
5. laser - Vision -IMU-GPS The fusion SLAM Algorithm sorting and code explanation
6. Thoroughly understand the vision - inertia SLAM: be based on VINS-Fusion The class officially started
7. Thoroughly understand based on LOAM Framework of the 3D laser SLAM: Source code analysis to algorithm optimization
8. Thorough analysis of indoor 、 Outdoor laser SLAM Key algorithm principle 、 Code and actual combat (cartographer+LOAM +LIO-SAM)
10. Monocular depth estimation method : Algorithm sorting and code implementation
11. Deployment of deep learning model in autopilot
12. Camera model and calibration ( Monocular + Binocular + fisheye )
13. blockbuster ! Four rotor aircraft : Algorithm and practice
14.ROS2 From entry to mastery : Theory and practice
15. The first one in China 3D Defect detection tutorial : theory 、 Source code and actual combat
16. be based on Open3D Introduction and practical tutorial of point cloud processing
blockbuster ! Computer vision workshop - Study Communication group Established
Scan the code to add a little assistant wechat , You can apply to join 3D Visual workshop - Academic paper writing and contribution WeChat ac group , Aimed at Communication Summit 、 Top issue 、SCI、EI And so on .
meanwhile You can also apply to join our subdivided direction communication group , At present, there are mainly ORB-SLAM Series source code learning 、3D Vision 、CV& Deep learning 、SLAM、 Three dimensional reconstruction 、 Point cloud post processing 、 Autopilot 、CV introduction 、 Three dimensional measurement 、VR/AR、3D Face recognition 、 Medical imaging 、 defect detection 、 Pedestrian recognition 、 Target tracking 、 Visual products landing 、 The visual contest 、 License plate recognition 、 Hardware selection 、 Depth estimation 、 Academic exchange 、 Job exchange Wait for wechat group , Please scan the following micro signal clustering , remarks :” Research direction + School / company + nickname “, for example :”3D Vision + Shanghai Jiaotong University + quietly “. Please note... According to the format , Otherwise, it will not pass . After successful addition, relevant wechat groups will be invited according to the research direction . Original contribution Please also contact .

▲ Long press and add wechat group or contribute

▲ The official account of long click attention
3D Vision goes from entry to mastery of knowledge : in the light of 3D In the field of vision Video Course cheng ( 3D reconstruction series 、 3D point cloud series 、 Structured light series 、 Hand eye calibration 、 Camera calibration 、 laser / Vision SLAM、 Automatically Driving, etc )、 Summary of knowledge points 、 Introduction advanced learning route 、 newest paper Share 、 Question answer Carry out deep cultivation in five aspects , There are also algorithm engineers from various large factories to provide technical guidance . meanwhile , The planet will be jointly released by well-known enterprises 3D Vision related algorithm development positions and project docking information , Create a set of technology and employment as one of the iron fans gathering area , near 4000 Planet members create better AI The world is making progress together , Knowledge planet portal :
Study 3D Visual core technology , Scan to see the introduction ,3 Unconditional refund within days

There are high quality tutorial materials in the circle 、 Answer questions and solve doubts 、 Help you solve problems efficiently
Feel useful , Please give me a compliment ~
边栏推荐
- 霸榜COCO!DINO: 让目标检测拥抱Transformer
- Hcip notes 11 days
- 基于redis6.2.4的redis cluster部署
- 【redis】redis安装
- What are the free low code development platforms?
- 使用Huggingface在矩池云快速加载预训练模型和数据集
- Update 3dcat real time cloud rendering V2.1.2 release
- Rebudget: balance efficiency and fairness in market-based multi-core resource allocation by reallocating the budget at run time
- Roson的Qt之旅#99 QML表格控件-TableView
- Postdoctoral recruitment | West Lake University Machine Intelligence Laboratory recruitment postdoctoral / Assistant Researcher / scientific research assistant
猜你喜欢

【目标检测】YOLOv5跑通VOC2007数据集(修复版)

【redis】redis安装

EasyUI DataGrid control uses

Starting from business needs, open the road of efficient IDC operation and maintenance

备考过程中,这些“谣言”千万不要信!

Hcip notes 12 days

Dynamic planning topic record
![[knowledge atlas] practice -- Practice of question answering system based on medical knowledge atlas (Part3): rule-based problem classification](/img/4c/aeebbc9698f8d5c23ed6473c9aca34.png)
[knowledge atlas] practice -- Practice of question answering system based on medical knowledge atlas (Part3): rule-based problem classification
![Sogou batch push software - Sogou batch push tool [2022 latest]](/img/87/d89c8d301743d1087d001a4f97de02.jpg)
Sogou batch push software - Sogou batch push tool [2022 latest]

基于SqlSugar的开发框架循序渐进介绍(13)-- 基于ElementPlus的上传组件进行封装,便于项目使用
随机推荐
Multi tenant software development architecture
C # introductory basic tutorial
Virtual memory management
Chapter 4: operators
【数学建模绘图系列教程】二、折线图的绘制与优化
爬虫框架-crawler
ACL 2022 | comparative learning based on optimal transmission to achieve interpretable semantic text similarity
Outlook tutorial, how to search for calendar items in outlook?
Data analysis and privacy security become the key factors for the success or failure of Web3.0. How do enterprises layout?
Briefly describe the implementation principle of redis cluster
虚拟内存管理
Rosen's QT journey 99 QML table control tableview
2D semantic segmentation -- deeplabv3plus reproduction
How to prevent the unburned gas when the city gas safety is alarmed again?
EasyUI drop-down box, add and put on and off shelves of products
Gtx1080ti fiber HDMI interference flash screen 1080ti flash screen solution
01. Sum of two numbers
unity 最好用热更方案卧龙 wolong
Technical difficulties and applications of large humanoid robots
Update 3dcat real time cloud rendering V2.1.2 release