当前位置:网站首页>Interpretation of transpose convolution theory (input-output size analysis)
Interpretation of transpose convolution theory (input-output size analysis)
2022-07-07 19:42:00 【Midnight rain】
Transposition convolution
Transpose convolution is an up sampling measure commonly used in the field of image segmentation , stay U-Net,FCN It's all in the world , Its main purpose is to sample the low resolution feature pattern to the resolution of the original image , To give the segmentation result of the original image . Transpose convolution, also known as deconvolution 、 Fractional step convolution , But in fact, transpose convolution is not the inverse process of convolution , It's just that the input and output correspond in shape , for instance , In steps of 2 The convolution of can reduce the input characteristic pattern to the original size 1/4, The transposed convolution can restore the reduced feature pattern to its original size , But the value at the corresponding position is not different .
The connection between convolution and transpose convolution
To introduce transpose convolution , First we need to review the principle of convolution , In general , Convolution operation is to establish a local connection between a rectangular region in the original graph , A value mapped to the output characteristic pattern , It's a region -> Individual mapping 1. As shown in the following example :
Of course, in this case, it is difficult to see how the operation of convolution can achieve inversion or negation ( The frequency domain method is shown in the following table ), Therefore, we can express the operation of convolution in a more regular form of matrix multiplication , The input image X X X And output feature pattern Y Y Y Flattening , The elements of convolution kernel are embedded into the corresponding position of sparse matrix to form a weight matrix W W W, Then convolution can be expressed as 2:
Y = W X Y=WX Y=WX
See the following figure for the specific process demonstration 3:
Accordingly, if we want to pass X X X Come and ask for Y Y Y, The very direct idea is to find the inverse of the weight matrix and then multiply it left , But the weight matrix is not a square matrix , So all we can do is build a left multiplication Y Y Y The output of X X X The same new matrix , The meet shape(W’)=shape(W):
X = W ′ T Y X=W'^TY X=W′TY
At the same time, we should ensure that the mapping relationship of this matrix is individual -> Regional , And the corresponding relationship between the front and back positions remains unchanged , The new matrix satisfying this condition can also exactly form a convolution kernel satisfying and Y Y Y The need for convolution , The operation corresponding to matrix multiplication can be transformed into convolution operation . Thus transposed convolution is also called deconvolution , But in fact, it is not in matrix inversion , Or deconvolution , But the transpose of the weight matrix ( Not exactly , Because the values are different , Just the same shape ).
Transpose convolution output shape analysis
The content of the previous summary is similar to the relevant analysis on the Internet , However, the analysis of the shape of transposed convolution input and output lacks a unified caliber , Often from the existing function or transpose convolution itself , There is no internal relationship between transpose convolution and convolution . therefore , This section will start from a theoretical point of view , Deduce the relationship between the input and output parameters of transpose convolution , So as to determine the calculation formula .
Convolution input and output parameters
First of all, for ordinary convolution process , The determination of input and output size is very simple , Satisfy the following formula :
W 2 = W 1 + 2 P 1 − F 1 S 1 + 1 (1) W_2=\frac{W_1+2P_1-F_1}{S_1}+1\tag{1} W2=S1W1+2P1−F1+1(1)
This formula is very easy to understand , W 1 + 2 P W_1+2P W1+2P Is the length of the filled picture , W 1 + 2 P 1 − F 1 {W_1+2P_1-F_1} W1+2P1−F1 Is the last position in the figure where a convolution kernel can be accommodated , W 1 + 2 P 1 − F 1 S 1 \frac{W_1+2P_1-F_1}{S_1} S1W1+2P1−F1 Calculated with S S S When moving, how many times does the convolution kernel need to move to reach the last body position , Add the last body position to get the formula .
Transpose convolution input and output parameters
Since transpose convolution is also a kind of convolution , Then its input and output must also meet the following formula :
W 1 = W 2 + 2 P 2 − F 2 S 2 + 1 (2) W_1=\frac{W_2+2P_2-F_2}{S_2}+1\tag{2} W1=S2W2+2P2−F2+1(2)
When we determine the process of forward convolution , We need to use the area before and after the transpose convolution -> The corresponding relationship between individuals to determine the parameters of transpose convolution .
The shape of convolution kernel will not change , In the forward convolution process and transpose convolution, the number and position of non-zero terms are the same , This is also the basis for both of them to be converted into convolution operation , so F 2 = F 1 F_2=F_1 F2=F1.
For step parameters , In the forward convolution process , The step size defines the input region of two adjacent convolution operations X i , X i + 1 X_i,X_{i+1} Xi,Xi+1 Overlapping area ( Width is W − S W-S W−S), That is, when only the first line of convolution kernel is considered , The input values related to both outputs at the same time are only W − S W-S W−S individual . Thus, in the process of transpose convolution , This means that two adjacent inputs are involved at the same time Y i , Y i + 1 Y_i,Y_{i+1} Yi,Yi+1 The output is only W − S W-S W−S individual , To achieve this goal , We need to fill in between two adjacent inputs 0 value , Slow down the step of convolution movement . This is why transpose convolution is also called fractional step convolution , Because you need to add S − 1 S-1 S−1 A zero , To meet the requirements of input-output correspondence , The original one-step leap S Elements , Now? S Step over 1 Elements . namely S 2 = 1 S 1 S_2=\frac{1}{S_1} S2=S11. In this case of fractional step size , The previous input-output calculation formula is no longer applicable , Therefore, it is expressed as default S 2 = 1 S_2=1 S2=1, Add S 1 − 1 S_1-1 S1−1 Zero value , The formula becomes :
W 1 = W 2 + 2 P 2 − F 2 + ( W 2 − 1 ) ( S − 1 ) 1 + 1 W_1=\frac{W_2+2P_2-F_2+(W_2-1)(S-1)}{1}+1 W1=1W2+2P2−F2+(W2−1)(S−1)+1
After determining the first two parameters , The relationship between filling values can be obtained by combining two formulas :
S 1 ( W 2 − 1 ) + F − 2 P 1 = W 2 + 2 P 2 − F + ( W 2 − 1 ) ( S 1 − 1 ) + 1 F − 2 P 1 = 2 P 2 − F + 2 P 2 = F − P 1 − 1 S_1(W_2-1)+F-2P_1=W_2+2P_2-F+(W_2-1)(S_1-1)+1\\ F-2P_1=2P_2-F+2\\ P_2=F-P_1-1 S1(W2−1)+F−2P1=W2+2P2−F+(W2−1)(S1−1)+1F−2P1=2P2−F+2P2=F−P1−1
The one-to-one correspondence of these three parameters ensures forward convolution 、 The correspondence of elements does not change in the process of transpose convolution , Readers can verify 4.
Transpose convolution input-output relationship
Usually, the convolution kernel size is also used when defining transpose convolution , step , Fill value , F , S , P F,S,P F,S,P These three parameters , among S S S It is actually the reciprocal of the real step , S − 1 S-1 S−1 Fill adjacent elements 0 The number of values . And here comes the dumbest question , Usually, when defining a transposed convolution , To emphasize that it is transposed convolution , We use its corresponding forward convolution parameter to define it . Then from the first two sections of convolution and transpose convolution parameters corresponding relationship , We can know that , The output size of transpose convolution is :
O = S ( W − 1 ) + 1 + 2 ( F − P − 1 ) − F + 1 = S ( W − 1 ) − 2 P + F \begin{aligned} O&=S(W-1)+1+2(F-P-1)-F+1 \\ &= S(W-1)-2P+F \end {aligned} O=S(W−1)+1+2(F−P−1)−F+1=S(W−1)−2P+F
Complex situation
Actually, the formula is 1 Inaccurate , Because there is W 1 + 2 P 1 − F 1 S 1 \frac{W_1+2P_1-F_1}{S_1} S1W1+2P1−F1 The case of non integer , At this time, the default is to round down , That is to say [ N S , N S + a ] , a ∈ { 0 , 1 , . . . , s − 1 } [NS,NS+a],a\in\{0,1,...,s-1\} [NS,NS+a],a∈{ 0,1,...,s−1} Within the scope of W 1 + 2 P 1 − F 1 S 1 \frac{W_1+2P_1-F_1}{S_1} S1W1+2P1−F1 All correspond to output results of the same size , So in order to recover the size of this part , We need to calculate the number of omitted values 5
a = ( W 1 + 2 P 1 − F 1 ) m o d ( S ) a=({W_1+2P_1-F_1})mod(S) a=(W1+2P1−F1)mod(S)
Then when calculating the transpose convolution padding Add extra a a a Zero value , As for the supplement on the upper left side , It depends on the function used .
summary
All in all , The parameters that define transpose convolution are 4 individual , From the corresponding forward convolution , They are the steps of forward convolution 、 Convolution kernel size 、 Number of fillings , Excess , S , F , P , a S,F,P,a S,F,P,a, With these parameters, the forward convolution input size, that is, the transposed convolution output size, can be obtained :
O = S ( W − 1 ) − 2 P + F + a O=S(W-1)-2P+F+a O=S(W−1)−2P+F+a
Reference resources
Understand deconvolution in a text , Transposition convolution ︎
deconvolution (Transposed Convolution) Detailed derivation ︎
A Comprehensive Introduction to Different Types of Convolutions in Deep Learning︎
ConvTranspose2d principle , How to upsample the deep network ?︎
Super detailed description and derivation of the relationship between convolution and deconvolution ( Deconvolution is also called transpose convolution 、 Fractional step convolution )︎
边栏推荐
- Jerry's headphones with the same channel are not allowed to pair [article]
- 编译原理 实验一:词法分析器的自动实现(Lex词法分析)
- Empowering smart power construction | Kirin Xin'an high availability cluster management system to ensure the continuity of users' key businesses
- How to open an account for stock speculation? Excuse me, is it safe to open a stock account by mobile phone?
- Longest common prefix (leetcode question 14)
- 索引总结(突击版本)
- 凌云出海记 | 赛盒&华为云:共助跨境电商行业可持续发展
- Chief technology officer of Pasqual: analog quantum computing takes the lead in bringing quantum advantages to industry
- ant desgin 多选
- How to share the same storage among multiple kubernetes clusters
猜你喜欢
Numpy——axis
2022如何评估与选择低代码开发平台?
ASP.NET幼儿园连锁管理系统源码
Version 2.0 of tapdata, the open source live data platform, has been released
杰理之手动配对方式【篇】
# 欢迎使用Markdown编辑器
转置卷积理论解释(输入输出大小分析)
Dynamic addition of El upload upload component; El upload dynamically uploads files; El upload distinguishes which component uploads the file.
Kirin Xin'an cloud platform is newly upgraded!
杰理之发起对耳配对、回连、开启可发现、可连接的轮循函数【篇】
随机推荐
Tp6 realize Commission ranking
what‘s the meaning of inference
what‘s the meaning of inference
Kunpeng developer summit 2022 | Kirin Xin'an and Kunpeng jointly build a new ecosystem of computing industry
R语言ggplot2可视化:使用ggpubr包的ggdensity函数可视化分组密度图、使用stat_overlay_normal_density函数为每个分组的密度图叠加正太分布曲线
Business experience in virtual digital human
小试牛刀之NunJucks模板引擎
Numpy——2.数组的形状
Chief technology officer of Pasqual: analog quantum computing takes the lead in bringing quantum advantages to industry
AI写首诗
杰理之开机自动配对【篇】
网信办公布《数据出境安全评估办法》,9 月 1 日起施行
2022.07.05
Kirin Xin'an with heterogeneous integration cloud financial information and innovation solutions appeared at the 15th Hunan Financial Technology Exchange Conference
LeetCode1051(C#)
ant desgin 多选
R language ggplot2 visualization: use the ggecdf function of ggpubr package to visualize the grouping experience cumulative density distribution function curve, and the linetype parameter to specify t
L1-019 who falls first (Lua)
LeetCode 535(C#)
ASP.NET体育馆综合会员管理系统源码,免费分享