当前位置:网站首页>Interpretation of transpose convolution theory (input-output size analysis)
Interpretation of transpose convolution theory (input-output size analysis)
2022-07-07 19:42:00 【Midnight rain】
Transposition convolution
Transpose convolution is an up sampling measure commonly used in the field of image segmentation , stay U-Net,FCN It's all in the world , Its main purpose is to sample the low resolution feature pattern to the resolution of the original image , To give the segmentation result of the original image . Transpose convolution, also known as deconvolution 、 Fractional step convolution , But in fact, transpose convolution is not the inverse process of convolution , It's just that the input and output correspond in shape , for instance , In steps of 2 The convolution of can reduce the input characteristic pattern to the original size 1/4, The transposed convolution can restore the reduced feature pattern to its original size , But the value at the corresponding position is not different .
The connection between convolution and transpose convolution
To introduce transpose convolution , First we need to review the principle of convolution , In general , Convolution operation is to establish a local connection between a rectangular region in the original graph , A value mapped to the output characteristic pattern , It's a region -> Individual mapping 1. As shown in the following example :
Of course, in this case, it is difficult to see how the operation of convolution can achieve inversion or negation ( The frequency domain method is shown in the following table ), Therefore, we can express the operation of convolution in a more regular form of matrix multiplication , The input image X X X And output feature pattern Y Y Y Flattening , The elements of convolution kernel are embedded into the corresponding position of sparse matrix to form a weight matrix W W W, Then convolution can be expressed as 2:
Y = W X Y=WX Y=WX
See the following figure for the specific process demonstration 3:
Accordingly, if we want to pass X X X Come and ask for Y Y Y, The very direct idea is to find the inverse of the weight matrix and then multiply it left , But the weight matrix is not a square matrix , So all we can do is build a left multiplication Y Y Y The output of X X X The same new matrix , The meet shape(W’)=shape(W):
X = W ′ T Y X=W'^TY X=W′TY
At the same time, we should ensure that the mapping relationship of this matrix is individual -> Regional , And the corresponding relationship between the front and back positions remains unchanged , The new matrix satisfying this condition can also exactly form a convolution kernel satisfying and Y Y Y The need for convolution , The operation corresponding to matrix multiplication can be transformed into convolution operation . Thus transposed convolution is also called deconvolution , But in fact, it is not in matrix inversion , Or deconvolution , But the transpose of the weight matrix ( Not exactly , Because the values are different , Just the same shape ).
Transpose convolution output shape analysis
The content of the previous summary is similar to the relevant analysis on the Internet , However, the analysis of the shape of transposed convolution input and output lacks a unified caliber , Often from the existing function or transpose convolution itself , There is no internal relationship between transpose convolution and convolution . therefore , This section will start from a theoretical point of view , Deduce the relationship between the input and output parameters of transpose convolution , So as to determine the calculation formula .
Convolution input and output parameters
First of all, for ordinary convolution process , The determination of input and output size is very simple , Satisfy the following formula :
W 2 = W 1 + 2 P 1 − F 1 S 1 + 1 (1) W_2=\frac{W_1+2P_1-F_1}{S_1}+1\tag{1} W2=S1W1+2P1−F1+1(1)
This formula is very easy to understand , W 1 + 2 P W_1+2P W1+2P Is the length of the filled picture , W 1 + 2 P 1 − F 1 {W_1+2P_1-F_1} W1+2P1−F1 Is the last position in the figure where a convolution kernel can be accommodated , W 1 + 2 P 1 − F 1 S 1 \frac{W_1+2P_1-F_1}{S_1} S1W1+2P1−F1 Calculated with S S S When moving, how many times does the convolution kernel need to move to reach the last body position , Add the last body position to get the formula .
Transpose convolution input and output parameters
Since transpose convolution is also a kind of convolution , Then its input and output must also meet the following formula :
W 1 = W 2 + 2 P 2 − F 2 S 2 + 1 (2) W_1=\frac{W_2+2P_2-F_2}{S_2}+1\tag{2} W1=S2W2+2P2−F2+1(2)
When we determine the process of forward convolution , We need to use the area before and after the transpose convolution -> The corresponding relationship between individuals to determine the parameters of transpose convolution .
The shape of convolution kernel will not change , In the forward convolution process and transpose convolution, the number and position of non-zero terms are the same , This is also the basis for both of them to be converted into convolution operation , so F 2 = F 1 F_2=F_1 F2=F1.
For step parameters , In the forward convolution process , The step size defines the input region of two adjacent convolution operations X i , X i + 1 X_i,X_{i+1} Xi,Xi+1 Overlapping area ( Width is W − S W-S W−S), That is, when only the first line of convolution kernel is considered , The input values related to both outputs at the same time are only W − S W-S W−S individual . Thus, in the process of transpose convolution , This means that two adjacent inputs are involved at the same time Y i , Y i + 1 Y_i,Y_{i+1} Yi,Yi+1 The output is only W − S W-S W−S individual , To achieve this goal , We need to fill in between two adjacent inputs 0 value , Slow down the step of convolution movement . This is why transpose convolution is also called fractional step convolution , Because you need to add S − 1 S-1 S−1 A zero , To meet the requirements of input-output correspondence , The original one-step leap S Elements , Now? S Step over 1 Elements . namely S 2 = 1 S 1 S_2=\frac{1}{S_1} S2=S11. In this case of fractional step size , The previous input-output calculation formula is no longer applicable , Therefore, it is expressed as default S 2 = 1 S_2=1 S2=1, Add S 1 − 1 S_1-1 S1−1 Zero value , The formula becomes :
W 1 = W 2 + 2 P 2 − F 2 + ( W 2 − 1 ) ( S − 1 ) 1 + 1 W_1=\frac{W_2+2P_2-F_2+(W_2-1)(S-1)}{1}+1 W1=1W2+2P2−F2+(W2−1)(S−1)+1
After determining the first two parameters , The relationship between filling values can be obtained by combining two formulas :
S 1 ( W 2 − 1 ) + F − 2 P 1 = W 2 + 2 P 2 − F + ( W 2 − 1 ) ( S 1 − 1 ) + 1 F − 2 P 1 = 2 P 2 − F + 2 P 2 = F − P 1 − 1 S_1(W_2-1)+F-2P_1=W_2+2P_2-F+(W_2-1)(S_1-1)+1\\ F-2P_1=2P_2-F+2\\ P_2=F-P_1-1 S1(W2−1)+F−2P1=W2+2P2−F+(W2−1)(S1−1)+1F−2P1=2P2−F+2P2=F−P1−1
The one-to-one correspondence of these three parameters ensures forward convolution 、 The correspondence of elements does not change in the process of transpose convolution , Readers can verify 4.
Transpose convolution input-output relationship
Usually, the convolution kernel size is also used when defining transpose convolution , step , Fill value , F , S , P F,S,P F,S,P These three parameters , among S S S It is actually the reciprocal of the real step , S − 1 S-1 S−1 Fill adjacent elements 0 The number of values . And here comes the dumbest question , Usually, when defining a transposed convolution , To emphasize that it is transposed convolution , We use its corresponding forward convolution parameter to define it . Then from the first two sections of convolution and transpose convolution parameters corresponding relationship , We can know that , The output size of transpose convolution is :
O = S ( W − 1 ) + 1 + 2 ( F − P − 1 ) − F + 1 = S ( W − 1 ) − 2 P + F \begin{aligned} O&=S(W-1)+1+2(F-P-1)-F+1 \\ &= S(W-1)-2P+F \end {aligned} O=S(W−1)+1+2(F−P−1)−F+1=S(W−1)−2P+F
Complex situation
Actually, the formula is 1 Inaccurate , Because there is W 1 + 2 P 1 − F 1 S 1 \frac{W_1+2P_1-F_1}{S_1} S1W1+2P1−F1 The case of non integer , At this time, the default is to round down , That is to say [ N S , N S + a ] , a ∈ { 0 , 1 , . . . , s − 1 } [NS,NS+a],a\in\{0,1,...,s-1\} [NS,NS+a],a∈{ 0,1,...,s−1} Within the scope of W 1 + 2 P 1 − F 1 S 1 \frac{W_1+2P_1-F_1}{S_1} S1W1+2P1−F1 All correspond to output results of the same size , So in order to recover the size of this part , We need to calculate the number of omitted values 5
a = ( W 1 + 2 P 1 − F 1 ) m o d ( S ) a=({W_1+2P_1-F_1})mod(S) a=(W1+2P1−F1)mod(S)
Then when calculating the transpose convolution padding Add extra a a a Zero value , As for the supplement on the upper left side , It depends on the function used .
summary
All in all , The parameters that define transpose convolution are 4 individual , From the corresponding forward convolution , They are the steps of forward convolution 、 Convolution kernel size 、 Number of fillings , Excess , S , F , P , a S,F,P,a S,F,P,a, With these parameters, the forward convolution input size, that is, the transposed convolution output size, can be obtained :
O = S ( W − 1 ) − 2 P + F + a O=S(W-1)-2P+F+a O=S(W−1)−2P+F+a
Reference resources
Understand deconvolution in a text , Transposition convolution ︎
deconvolution (Transposed Convolution) Detailed derivation ︎
A Comprehensive Introduction to Different Types of Convolutions in Deep Learning︎
ConvTranspose2d principle , How to upsample the deep network ?︎
Super detailed description and derivation of the relationship between convolution and deconvolution ( Deconvolution is also called transpose convolution 、 Fractional step convolution )︎
边栏推荐
- R语言dplyr包mutate_at函数和min_rank函数计算dataframe中指定数据列的排序序号值、名次值、将最大值的rank值赋值为1
- how to prove compiler‘s correctness
- L1-023 output gplt (Lua)
- LeetCode 535(C#)
- 5billion, another master fund was born in Fujian
- R语言fpc包的dbscan函数对数据进行密度聚类分析、查看所有样本的聚类标签、table函数计算聚类簇标签与实际标签构成的二维列联表
- Redis master-slave and sentinel master-slave switchover are built step by step
- Responsibility chain model - unity
- 关于ssh登录时卡顿30s左右的问题调试处理
- Jerry's headphones with the same channel are not allowed to pair [article]
猜你喜欢
位运算介绍
杰理之关于 TWS 配对方式配置【篇】
Business experience in virtual digital human
2022如何评估与选择低代码开发平台?
编译原理 实验一:词法分析器的自动实现(Lex词法分析)
Zhong Xuegao wants to remain innocent in the world
el-upload上传组件的动态添加;el-upload动态上传文件;el-upload区分文件是哪个组件上传的。
项目经理『面试八问』,看了等于会了
Numpy——2.数组的形状
Dynamic addition of El upload upload component; El upload dynamically uploads files; El upload distinguishes which component uploads the file.
随机推荐
L1-023 output gplt (Lua)
Le PGR est - il utile au travail? Comment choisir une plate - forme fiable pour économiser le cœur et la main - d'œuvre lors de la préparation de l'examen!!!
LeetCode 535(C#)
ASP. Net gymnasium integrated member management system source code, free sharing
Kunpeng developer summit 2022 | Kirin Xin'an and Kunpeng jointly build a new ecosystem of computing industry
时间工具类
转置卷积理论解释(输入输出大小分析)
IP tools
Version 2.0 of tapdata, the open source live data platform, has been released
AD域组策略管理
多个kubernetes集群如何实现共享同一个存储
Download from MySQL official website: mysql8 for Linux X Version (Graphic explanation)
Netease Yunxin participated in the preparation of the standard "real time audio and video service (RTC) basic capability requirements and evaluation methods" issued by the Chinese Academy of Communica
杰理之按键发起配对【篇】
Make insurance more "safe"! Kirin Xin'an one cloud multi-core cloud desktop won the bid of China Life Insurance, helping the innovation and development of financial and insurance information technolog
PMP practice once a day | don't get lost in the exam -7.7
Command mode - unity
Redis——基本使用(key、String、List、Set 、Zset 、Hash、Geo、Bitmap、Hyperloglog、事务 )
编译原理 实验一:词法分析器的自动实现(Lex词法分析)
R语言使用ggplot2函数可视化需要构建泊松回归模型的计数目标变量的直方图分布并分析构建泊松回归模型的可行性