当前位置:网站首页>[2022 freshmen learning] key points of the third week
[2022 freshmen learning] key points of the third week
2022-07-29 05:03:00 【AI frontier theory group @ouc】
1、Batch Normalization
Use BN What we need to pay attention to when :
- Because the mean and variance will be counted in real time during training , However, historical statistical values should be used in the test , Not current , So set it up . stay pytorch You can create a model by model.train() and model.eval() Method control .( Similar to that Dropout)
- batch size Set it as large as possible , The larger the setting, the closer the mean and variance are to the real distribution of the whole data set .( But also consider your hardware )
- take BN Layer on the convolution layer (Conv) And activation layer ( for example ReLU) Between , And don't use bias in the convolution layer bias.
2、 Grouping convolution

Group Convolution It's about input feature map Grouping , Then each group is convoluted separately . If divided into G Group , Then the parameter quantity will be reduced to the original 1/G.
Speed up : Theoretically, it can improve the speed of the network , But in fact, there may not be 3X3 High convolution efficiency , This is because pytorch Specifically for 3X3 The convolution of is optimized , Grouping convolution destroys this optimization .
Accuracy improvement : Grouping can transform features into several subspaces (subspace), Have a more comprehensive understanding of image information . Something like Transformer Inside Multi-Head Self-Attention, It's just Transformer Inside is the grouping of attention calculation , It was called “ long position ”, And the group convolution is right convolution grouping .

Transformer From natural language processing , In a real language environment , Every word and different words , Have different relationships . You can use different Attention To complement these different relationships . The above figure shows three attention, That is, three subspaces , You can better learn this relationship in subspace .

AlexNet There is also a classic discovery in , The first three lines in the above figure are GPU1 Learned filter , The last three lines are GPU2 Learned filter . You can find a major learning texture 、 Gradient information , Another major learning color information , It can be understood as different subspaces .
3、Res2Net

From the work of chengmingming teacher group of Nankai University , Characteristics of the group + The perfect combination of multiple scales . Two experiments in this paper discuss the grouping of features . It can be seen that , With the improvement of scale , Accuracy will improve , The speed will decrease . and , Grouping is greater than 4 When , Relative to grouping as 4 The improvement is not very obvious . therefore , Feature grouping is not the more the better , Increasing the number of groups will increase the calculation consumption , Need a certain balance .


边栏推荐
- Ethernet of network
- 力扣------对奇偶下标分别排序
- Wps如何使用智能填充快速填充数据?Wps快速填充数据的方法
- Double type nullpointexception in Flink flow calculation
- 2021-10-23
- EMI interference troubleshooting with near-field probe and current probe
- 新产品上市最全推广方案
- Sparksql inserts or updates in batches and saves data to MySQL
- IOS interview preparation - other articles
- Let you understand several common traffic exposure schemes in kubernetes cluster
猜你喜欢

荣耀2023内推,内推码ambubk
Let you understand several common traffic exposure schemes in kubernetes cluster

【无标题】

What if the computer cannot open excel? The solution of Excel not opening

Reveal installation configuration debugging

怎样监测微型的网站服务

新产品上市最全推广方案
![[untitled]](/img/04/242e85ee8eea5bd6ae8144fc048241.png)
[untitled]

Flutter 手势监听和画板实现

Implementation of flutter gesture monitoring and Sketchpad
随机推荐
Take you to understand JS array
ThreadPoolExecutor simple to use
[wechat applet -- solve the alignment problem of the last line of display:flex. (discontinuous arrangement will be divided into two sides)]
What servers are needed to build mobile app
Use jupyter (2) to establish shortcuts to open jupyter and common shortcut keys of jupyter
Data Lake: spark, a distributed open source processing engine
SGuard64.exe ACE-Guard Client EXE:造成磁盘经常读写,游戏卡顿,及解决方案
电脑无法打开excel表格怎么办?excel打不开的解决方法
How does word view document modification traces? How word views document modification traces
What if the computer cannot open excel? The solution of Excel not opening
2022杭电多校联赛第四场 题解
How does WPS use smart fill to quickly fill data? WPS method of quickly filling data
EF Core: 一对一,多对多的配置
Reveal installation configuration debugging
Recyclerview switches the focus up and down through the dpad key. When switching to the control outside the interface, the focus will jump left and right
EMI interference troubleshooting with near-field probe and current probe
Build auto.js script development environment
Leetcode (Sword finger offer) - 53 - I. find the number I in the sorted array
The difference between the two ways of thread implementation - simple summary
带你搞懂 Kubernetes 集群中几种常见的流量暴露方案