当前位置:网站首页>Interpretation of transpose convolution theory (input-output size analysis)
Interpretation of transpose convolution theory (input-output size analysis)
2022-07-07 19:42:00 【Midnight rain】
Transposition convolution
Transpose convolution is an up sampling measure commonly used in the field of image segmentation , stay U-Net,FCN It's all in the world , Its main purpose is to sample the low resolution feature pattern to the resolution of the original image , To give the segmentation result of the original image . Transpose convolution, also known as deconvolution 、 Fractional step convolution , But in fact, transpose convolution is not the inverse process of convolution , It's just that the input and output correspond in shape , for instance , In steps of 2 The convolution of can reduce the input characteristic pattern to the original size 1/4, The transposed convolution can restore the reduced feature pattern to its original size , But the value at the corresponding position is not different .
The connection between convolution and transpose convolution
To introduce transpose convolution , First we need to review the principle of convolution , In general , Convolution operation is to establish a local connection between a rectangular region in the original graph , A value mapped to the output characteristic pattern , It's a region -> Individual mapping 1. As shown in the following example :
Of course, in this case, it is difficult to see how the operation of convolution can achieve inversion or negation ( The frequency domain method is shown in the following table ), Therefore, we can express the operation of convolution in a more regular form of matrix multiplication , The input image X X X And output feature pattern Y Y Y Flattening , The elements of convolution kernel are embedded into the corresponding position of sparse matrix to form a weight matrix W W W, Then convolution can be expressed as 2:
Y = W X Y=WX Y=WX
See the following figure for the specific process demonstration 3:
Accordingly, if we want to pass X X X Come and ask for Y Y Y, The very direct idea is to find the inverse of the weight matrix and then multiply it left , But the weight matrix is not a square matrix , So all we can do is build a left multiplication Y Y Y The output of X X X The same new matrix , The meet shape(W’)=shape(W):
X = W ′ T Y X=W'^TY X=W′TY
At the same time, we should ensure that the mapping relationship of this matrix is individual -> Regional , And the corresponding relationship between the front and back positions remains unchanged , The new matrix satisfying this condition can also exactly form a convolution kernel satisfying and Y Y Y The need for convolution , The operation corresponding to matrix multiplication can be transformed into convolution operation . Thus transposed convolution is also called deconvolution , But in fact, it is not in matrix inversion , Or deconvolution , But the transpose of the weight matrix ( Not exactly , Because the values are different , Just the same shape ).
Transpose convolution output shape analysis
The content of the previous summary is similar to the relevant analysis on the Internet , However, the analysis of the shape of transposed convolution input and output lacks a unified caliber , Often from the existing function or transpose convolution itself , There is no internal relationship between transpose convolution and convolution . therefore , This section will start from a theoretical point of view , Deduce the relationship between the input and output parameters of transpose convolution , So as to determine the calculation formula .
Convolution input and output parameters
First of all, for ordinary convolution process , The determination of input and output size is very simple , Satisfy the following formula :
W 2 = W 1 + 2 P 1 − F 1 S 1 + 1 (1) W_2=\frac{W_1+2P_1-F_1}{S_1}+1\tag{1} W2=S1W1+2P1−F1+1(1)
This formula is very easy to understand , W 1 + 2 P W_1+2P W1+2P Is the length of the filled picture , W 1 + 2 P 1 − F 1 {W_1+2P_1-F_1} W1+2P1−F1 Is the last position in the figure where a convolution kernel can be accommodated , W 1 + 2 P 1 − F 1 S 1 \frac{W_1+2P_1-F_1}{S_1} S1W1+2P1−F1 Calculated with S S S When moving, how many times does the convolution kernel need to move to reach the last body position , Add the last body position to get the formula .
Transpose convolution input and output parameters
Since transpose convolution is also a kind of convolution , Then its input and output must also meet the following formula :
W 1 = W 2 + 2 P 2 − F 2 S 2 + 1 (2) W_1=\frac{W_2+2P_2-F_2}{S_2}+1\tag{2} W1=S2W2+2P2−F2+1(2)
When we determine the process of forward convolution , We need to use the area before and after the transpose convolution -> The corresponding relationship between individuals to determine the parameters of transpose convolution .
The shape of convolution kernel will not change , In the forward convolution process and transpose convolution, the number and position of non-zero terms are the same , This is also the basis for both of them to be converted into convolution operation , so F 2 = F 1 F_2=F_1 F2=F1.
For step parameters , In the forward convolution process , The step size defines the input region of two adjacent convolution operations X i , X i + 1 X_i,X_{i+1} Xi,Xi+1 Overlapping area ( Width is W − S W-S W−S), That is, when only the first line of convolution kernel is considered , The input values related to both outputs at the same time are only W − S W-S W−S individual . Thus, in the process of transpose convolution , This means that two adjacent inputs are involved at the same time Y i , Y i + 1 Y_i,Y_{i+1} Yi,Yi+1 The output is only W − S W-S W−S individual , To achieve this goal , We need to fill in between two adjacent inputs 0 value , Slow down the step of convolution movement . This is why transpose convolution is also called fractional step convolution , Because you need to add S − 1 S-1 S−1 A zero , To meet the requirements of input-output correspondence , The original one-step leap S Elements , Now? S Step over 1 Elements . namely S 2 = 1 S 1 S_2=\frac{1}{S_1} S2=S11. In this case of fractional step size , The previous input-output calculation formula is no longer applicable , Therefore, it is expressed as default S 2 = 1 S_2=1 S2=1, Add S 1 − 1 S_1-1 S1−1 Zero value , The formula becomes :
W 1 = W 2 + 2 P 2 − F 2 + ( W 2 − 1 ) ( S − 1 ) 1 + 1 W_1=\frac{W_2+2P_2-F_2+(W_2-1)(S-1)}{1}+1 W1=1W2+2P2−F2+(W2−1)(S−1)+1
After determining the first two parameters , The relationship between filling values can be obtained by combining two formulas :
S 1 ( W 2 − 1 ) + F − 2 P 1 = W 2 + 2 P 2 − F + ( W 2 − 1 ) ( S 1 − 1 ) + 1 F − 2 P 1 = 2 P 2 − F + 2 P 2 = F − P 1 − 1 S_1(W_2-1)+F-2P_1=W_2+2P_2-F+(W_2-1)(S_1-1)+1\\ F-2P_1=2P_2-F+2\\ P_2=F-P_1-1 S1(W2−1)+F−2P1=W2+2P2−F+(W2−1)(S1−1)+1F−2P1=2P2−F+2P2=F−P1−1
The one-to-one correspondence of these three parameters ensures forward convolution 、 The correspondence of elements does not change in the process of transpose convolution , Readers can verify 4.
Transpose convolution input-output relationship
Usually, the convolution kernel size is also used when defining transpose convolution , step , Fill value , F , S , P F,S,P F,S,P These three parameters , among S S S It is actually the reciprocal of the real step , S − 1 S-1 S−1 Fill adjacent elements 0 The number of values . And here comes the dumbest question , Usually, when defining a transposed convolution , To emphasize that it is transposed convolution , We use its corresponding forward convolution parameter to define it . Then from the first two sections of convolution and transpose convolution parameters corresponding relationship , We can know that , The output size of transpose convolution is :
O = S ( W − 1 ) + 1 + 2 ( F − P − 1 ) − F + 1 = S ( W − 1 ) − 2 P + F \begin{aligned} O&=S(W-1)+1+2(F-P-1)-F+1 \\ &= S(W-1)-2P+F \end {aligned} O=S(W−1)+1+2(F−P−1)−F+1=S(W−1)−2P+F
Complex situation
Actually, the formula is 1 Inaccurate , Because there is W 1 + 2 P 1 − F 1 S 1 \frac{W_1+2P_1-F_1}{S_1} S1W1+2P1−F1 The case of non integer , At this time, the default is to round down , That is to say [ N S , N S + a ] , a ∈ { 0 , 1 , . . . , s − 1 } [NS,NS+a],a\in\{0,1,...,s-1\} [NS,NS+a],a∈{ 0,1,...,s−1} Within the scope of W 1 + 2 P 1 − F 1 S 1 \frac{W_1+2P_1-F_1}{S_1} S1W1+2P1−F1 All correspond to output results of the same size , So in order to recover the size of this part , We need to calculate the number of omitted values 5
a = ( W 1 + 2 P 1 − F 1 ) m o d ( S ) a=({W_1+2P_1-F_1})mod(S) a=(W1+2P1−F1)mod(S)
Then when calculating the transpose convolution padding Add extra a a a Zero value , As for the supplement on the upper left side , It depends on the function used .
summary
All in all , The parameters that define transpose convolution are 4 individual , From the corresponding forward convolution , They are the steps of forward convolution 、 Convolution kernel size 、 Number of fillings , Excess , S , F , P , a S,F,P,a S,F,P,a, With these parameters, the forward convolution input size, that is, the transposed convolution output size, can be obtained :
O = S ( W − 1 ) − 2 P + F + a O=S(W-1)-2P+F+a O=S(W−1)−2P+F+a
Reference resources
Understand deconvolution in a text , Transposition convolution ︎
deconvolution (Transposed Convolution) Detailed derivation ︎
A Comprehensive Introduction to Different Types of Convolutions in Deep Learning︎
ConvTranspose2d principle , How to upsample the deep network ?︎
Super detailed description and derivation of the relationship between convolution and deconvolution ( Deconvolution is also called transpose convolution 、 Fractional step convolution )︎
边栏推荐
- Is PMP beneficial to work? How to choose a reliable platform to make it easier to prepare for the exam!!!
- Kirin Xin'an with heterogeneous integration cloud financial information and innovation solutions appeared at the 15th Hunan Financial Technology Exchange Conference
- 吞吐量Throughout
- Numpy——axis
- 2022.07.05
- PMP practice once a day | don't get lost in the exam -7.7
- L1-028 judging prime number (Lua)
- The project manager's "eight interview questions" is equal to a meeting
- Browse the purpose of point setting
- Visual Studio 插件之CodeMaid自动整理代码
猜你喜欢

ASP. Net kindergarten chain management system source code

Kunpeng developer summit 2022 | Kirin Xin'an and Kunpeng jointly build a new ecosystem of computing industry

PV static creation and dynamic creation

How to estimate the value of "not selling pens" Chenguang?

2022.07.05

关于ssh登录时卡顿30s左右的问题调试处理

Is PMP beneficial to work? How to choose a reliable platform to make it easier to prepare for the exam!!!

Numpy——axis

杰理之测试盒配置声道【篇】

位运算介绍
随机推荐
LC: string conversion integer (ATOI) + appearance sequence + longest common prefix
Tips and tricks of image segmentation summarized from 39 Kabul competitions
What does "true" mean
Pasqal首席技术官:模拟量子计算率先为工业带来量子优势
# 欢迎使用Markdown编辑器
PMP practice once a day | don't get lost in the exam -7.7
R language ggplot2 visualization: use the ggecdf function of ggpubr package to visualize the grouping experience cumulative density distribution function curve, and the linetype parameter to specify t
State mode - Unity (finite state machine)
R语言ggplot2可视化:使用ggpubr包的ggdensity函数可视化分组密度图、使用stat_overlay_normal_density函数为每个分组的密度图叠加正太分布曲线
杰理之相同声道的耳机不允许配对【篇】
The research group of the Hunan Organizing Committee of the 24th China Association for science and technology visited Kirin Xin'an
How to estimate the value of "not selling pens" Chenguang?
解决rosdep的报错问题
【Confluence】JVM内存调整
CMD command enters MySQL times service name or command error (fool teaching)
L1-025 positive integer a+b (Lua)
炒股如何开户?请问一下手机开户股票开户安全吗?
UCloud是基础云计算服务提供商
杰理之开机自动配对【篇】
Numpy——2. Shape of array