当前位置:网站首页>Use FFT, matrix multiplication and conv2d to calculate convolution based on pytorch
Use FFT, matrix multiplication and conv2d to calculate convolution based on pytorch
2022-07-28 22:53:00 【leu_ mon】
be based on Pytorch Use FFT, Matrix multiplication ,Conv2d Compute convolution
The goal is : Calculation 64*64 matrix X and 3*3 matrix H Convolution of Y=X*H
Section 1 : Import library
# Import the required modules
import torch
import torch.nn as nn
from timeit import Timer
# Create a four-dimensional random tensor , The number of samples is 1, The number of channels is 1, The size is 64*64 For image
x_n = torch.tensor(torch.randint(0,128,[1,1,64,64]),dtype=torch.float32)
# Create a four-dimensional random tensor , The number of samples is 1, The number of channels is 1, The size is 3*3 Is the convolution kernel
h_n = torch.randn(1,1,3,3)
>>> <ipython-input-1-d7583688b6eb>:6: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
x_n = torch.tensor(torch.randint(0,128,[1,1,64,64]),dtype=torch.float32) # Create a four-dimensional random tensor , The number of samples is 1, The number of channels is 1, The size is 64*64 For image
In the second quarter : Defined function
2.1 Use fft Compute convolution
def fft_test():
""" use fft Compute convolution """
x_n_fft = torch.fft.fft2(x_n[0,0]) # To image x_n Conduct fft
h_trans = torch.flip(h_n[0,0], [1]) # For convolution kernels h_n Flip left and right
h_trans = torch.flipud(h_trans) # For convolution kernels h_n Flip up and down
pad = nn.ZeroPad2d(padding=(0, 61, 0, 61)) # Set expansion matrix filling 0 Dimensions ( Left , Right , On , Next )
h_n_pad = pad(h_trans) # Expand the matrix to 64*64
h_n_fft = torch.fft.fft2(h_n_pad) # Do the extended convolution kernel fft
res = x_n_fft.mul(h_n_fft) # Dot multiplication of two matrices
# The matrix is ifft, Convert to the desired result
res = torch.real(torch.fft.ifft2(res)[2:,2:]).view(1,1,62,62)
return res # Return results
2.2 Use matrix multiplication to calculate convolution
def multi_test():
""" Calculate convolution by matrix multiplication """
res = [] # Define a list , Used to store the results of calculation
for i in range(62):
for j in range(62): # Traversal image matrix
x_n_n = x_n[0,0,i:i+3,j:j+3] # Take the matrix block corresponding to the image matrix
res0 = torch.sum(x_n_n.mul(h_n)) # Matrix block and convolution kernel point multiplication sum to get a convolution
res.append(res0) # Store the convolution results in the list
res = torch.Tensor(res).view([1,1,62,62]) # Convert the list to the desired result
return res # Return results
2.3 Calculate convolution using built-in functions
def conv2d_test():
""" With built-in functions nn.Conv2d Compute convolution """
conv = nn.Conv2d(1,1,3) # Definition conv2d Parameters of ,( Enter the number of samples , Number of output samples , Convolution kernel size )
conv.weight.data = h_n # Pass in a custom convolution kernel h_n
res = conv(x_n) # Calculation x_n And h_n Convolution
return res # Return results
2.4 Test the running time of the function
def test_time(test_name=""):
""" Test the running time of the function , Input : Name of the function to be tested """
test_import = "from __main__ import " + test_name
test_name = test_name + "()" # Get ready timer The parameters required for the function
test = Timer(test_name,test_import) # Set the time test object ( Test function name , Import test function )
time = test.timeit(number=1000) # Set the number of tests 1000 Time
print("%s is run %.3f ms"%(test_name,time)) # Test time is s, But the test 1000 The second reason is ms
In the third quarter : test
3.1 Run time test
test_time("fft_test")
test_time("multi_test")
test_time("conv2d_test")
>>> fft_test() is run 0.262 ms
multi_test() is run 61.161 ms
conv2d_test() is run 0.187 ms
Conclusion : Matrix multiplication is the slowest way to realize convolution , Use FFT Computing convolution is slightly slower than computing convolution directly with built-in functions .
3.2 Output of each method
fft_test()
>>> tensor([[[[ 186.7925, 64.8263, -217.0269, ..., -54.8264, -16.5303,
-57.4855],
[ 105.5789, 103.1986, -45.4856, ..., 108.9532, -11.3397,
-74.9136],
[ 37.4062, -159.4092, -82.7995, ..., -18.7437, 103.2889,
-56.7241],
...,
[ -57.3688, 205.4553, 114.1560, ..., 61.6281, -30.6847,
123.7901],
[ 107.0706, 222.8759, 189.9121, ..., 118.7491, 61.0583,
154.0859],
[ 157.7202, 55.5320, -37.9096, ..., -37.6815, 39.8024,
-123.9654]]]])
multi_test()
>>> tensor([[[[ 186.7925, 64.8263, -217.0269, ..., -54.8264, -16.5303,
-57.4854],
[ 105.5789, 103.1987, -45.4857, ..., 108.9533, -11.3397,
-74.9136],
[ 37.4062, -159.4092, -82.7995, ..., -18.7437, 103.2889,
-56.7241],
...,
[ -57.3689, 205.4553, 114.1560, ..., 61.6282, -30.6847,
123.7902],
[ 107.0706, 222.8759, 189.9121, ..., 118.7491, 61.0583,
154.0859],
[ 157.7202, 55.5320, -37.9096, ..., -37.6815, 39.8024,
-123.9654]]]])
conv2d_test()
>>> tensor([[[[ 187.0928, 65.1265, -216.7267, ..., -54.5261, -16.2301,
-57.1852],
[ 105.8792, 103.4989, -45.1854, ..., 109.2535, -11.0395,
-74.6133],
[ 37.7064, -159.1090, -82.4993, ..., -18.4435, 103.5891,
-56.4239],
...,
[ -57.0686, 205.7555, 114.4562, ..., 61.9284, -30.3845,
124.0904],
[ 107.3709, 223.1761, 190.2123, ..., 119.0493, 61.3585,
154.3862],
[ 158.0205, 55.8323, -37.6093, ..., -37.3813, 40.1026,
-123.6652]]]], grad_fn=<ThnnConv2DBackward>)
Conclusion : FFT The result of convolution calculation is consistent with that of matrix multiplication , Convolution result of built-in function calculation is unstable , But the results are basically consistent .
3.3 Analysis of computational complexity
# Import the required modules
import torch
import torch.nn as nn
from timeit import Timer
import big_o
# Create a four-dimensional random tensor , The number of samples is 1, The number of channels is 1, The size is 64*64 For image
# x_n = torch.tensor(torch.randint(0,128,[1,1,n,n]),dtype=torch.float32)
# Create a four-dimensional random tensor , The number of samples is 1, The number of channels is 1, The size is 3*3 Is the convolution kernel
h_n = torch.randn(1,1,3,3)
3.3.1 It is estimated that fft Time complexity
def fft_test(x_n):
""" use fft Compute convolution """
x_n_fft = torch.fft.fft2(x_n[0,0]) # To image x_n Conduct fft
h_trans = torch.flip(h_n[0,0], [1]) # For convolution kernels h_n Flip left and right
h_trans = torch.flipud(h_trans) # For convolution kernels h_n Flip up and down
pad = nn.ZeroPad2d(padding=(0,57, 0, 57)) # Set expansion matrix filling 0 Dimensions ( Left , Right , On , Next )
h_n_pad = pad(h_trans) # Expand the matrix to 64*64
h_n_fft = torch.fft.fft2(h_n_pad) # Do the extended convolution kernel fft
res = x_n_fft.mul(h_n_fft) # Dot multiplication of two matrices
# The matrix is ifft, Convert to the desired result
res = torch.real(torch.fft.ifft2(res)[2:,2:]).view(1,1,58,58)
return res # Return results
- Estimated time complexity ( The size of the image n*n As a parameter )
According to the program, it can be found that the main factors affecting the computational complexity are two FFT And once IFFT, Let the image matrix be n*n, Convolution kernel matrix is 3*3;
take The maximum complexity is FFT The complexity of the transformation n 2 l o g n n^2logn n2logn, Therefore, the computational complexity is estimated to be O ( n 2 l o g n ) . O(n^2logn). O(n2logn).
- Time complexity of program calculation ( Image as a parameter )
positive_int_generator = lambda n : torch.tensor(torch.randint(0,128,[1,1,64,64]),dtype=torch.float32)
# Pass the image matrix as a parameter
Use big_o modular Perform complexity estimation , By modifying the size of the matrix, the following groups of results are obtained :
best,other = big_o.big_o(fft_test,positive_int_generator,n_repeats=100)
print(best)
>>><ipython-input-87-b8f03f88cb0c>:1: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
positive_int_generator = lambda n : torch.tensor(torch.randint(0,128,[1,1,520,520]),dtype=torch.float32)
>>>Logarithmic: time = 0.089 + -0.0031*log(n) (sec)
positive_int_generator = lambda n : torch.tensor(torch.randint(0,128,[1,1,220,220]),dtype=torch.float32)
>>>Logarithmic: time = 0.11 + -0.0043*log(n) (sec)
positive_int_generator = lambda n : torch.tensor(torch.randint(0,128,[1,1,120,120]),dtype=torch.float32)
>>>Logarithmic: time = 0.59 + -0.019*log(n) (sec)
positive_int_generator = lambda n : torch.tensor(torch.randint(0,128,[1,1,60,60]),dtype=torch.float32)
>>>Logarithmic: time = 0.089 + -0.0054*log(n) (sec)
Through the analysis of several groups of images of different sizes , The computational complexity is closer O ( l o g n ) O(logn) O(logn).
3.3.2 Estimate the time complexity of matrix multiplication
def multi_test(x_n):
""" Calculate convolution by matrix multiplication """
res = [] # Define a list , Used to store the results of calculation
for i in range(62):
for j in range(62): # Traversal image matrix
x_n_n = x_n[0,0,i:i+3,j:j+3] # Take the matrix block corresponding to the image matrix
res0 = torch.sum(x_n_n.mul(h_n)) # Matrix block and convolution kernel point multiplication sum to get a convolution
res.append(res0) # Store the convolution results in the list
res = torch.Tensor(res).view([1,1,62,62]) # Convert the list to the desired result
return res # Return results
- Estimated time complexity ( The size of the image n*n As a parameter )
According to the program, it can be found that the one that affects the time complexity is Double nested loop , Let the image matrix be n*n, Convolution kernel matrix is 3*3;
Outermost layer for Circulatory ( n − 3 ) (n-3) (n−3), Therefore, the estimated complexity is n n n;
Internal circulation The most complex operation is to take matrix blocks , It can be considered as a two-dimensional slicing operation , from 0 0 0 To n − 3 n-3 n−3, Yes ( ( n − 3 ) / 2 ) 2 ((n-3)/2)^2 ((n−3)/2)2, Therefore, the complexity can be taken as n 2 n^2 n2;
Ignore other factors , It can be considered that the computational complexity is O ( n 3 ) O(n^3) O(n3). - Time complexity of program calculation ( Image as a parameter )
best,other = big_o.big_o(multi_test,positive_int_generator,n_repeats=1)
print(best)
>>><ipython-input-9-ae97cb9d6fcc>:1: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
positive_int_generator = lambda n : torch.tensor(torch.randint(0,128,[1,1,160,160]),dtype=torch.float32)
>>>Cubic: time = 0.43 + -2.4E-17*n^3 (sec)
positive_int_generator = lambda n : torch.tensor(torch.randint(0,128,[1,1,100,100]),dtype=torch.float32)
>>>Cubic: time = 0.2 + -2.4E-17*n^3 (sec)
positive_int_generator = lambda n : torch.tensor(torch.randint(0,128,[1,1,64,64]),dtype=torch.float32)
>>>Cubic: time = 0.062 + 1.9E-17*n^3 (sec)
Through the analysis of several groups of images of different sizes , The computational complexity is closer O ( n 3 ) O(n^3) O(n3).
PS: The computational complexity analysis of this article is personal ( beginner ) Point of view , If there is a mistake , I hope you guys will correct , thank you !
边栏推荐
- Yolov5 improvement 5: improve the feature fusion network panet to bifpn
- Paper reading vision gnn: an image is worth graph of nodes
- Wheel 6: qserialport serial port data transceiver
- fatal error: io. h: No such file or directory
- [get mobile information] - get mobile information through ADB command
- HP ProLiant DL380 boot from USB flash drive, press which key
- Fastflow [abnormal detection: normalizing flow]
- STM32 - systick timer (cubemx configures systick)
- Differernet [anomaly detection: normalizing flow]
- Learning experience sharing 5: yolov5 dataset division and Yolo format conversion
猜你喜欢

842. Arrange numbers
![[3D target detection] 3dssd (I)](/img/84/bcd3fe0ba811ea79248a5f50b15429.png)
[3D target detection] 3dssd (I)

Reading of "robust and communication efficient federated learning from non-i.i.d. data"
![[reprint] the token token is used in the login scenario](/img/84/77dc2316e2adc380a580e2456c0e59.png)
[reprint] the token token is used in the login scenario

TypeError: can‘t convert cuda:0 device type tensor to numpy. Use Tensor. cpu() to copy the tensor to
![[virtual machine _2]-hyper-v and vmware/virtualbox cannot coexist](/img/90/c481a817dc91d7e5247dd8e3ee1dae.png)
[virtual machine _2]-hyper-v and vmware/virtualbox cannot coexist

Install PCL and VTK under the background of ROS installation, and solve VTK and PCL_ ROS conflict problem
![[connect set-top box] - use ADB command line to connect ec6108v9 Huawei Yuehe box wirelessly](/img/ab/624e9a3240416f8445c908378310ad.png)
[connect set-top box] - use ADB command line to connect ec6108v9 Huawei Yuehe box wirelessly

Improvement 17 of yolov5: cnn+transformer -- integrating bottleneck transformers

Improvement 11 of yolov5: replace backbone network C3 with lightweight network mobilenetv3
随机推荐
Morphology of image
OSV_ Q write divergence operator div and Laplace stepped on the pit
Yolov5 improvement 12: replace backbone network C3 with lightweight network shufflenetv2
[get mobile information] - get mobile information through ADB command
console.log()控制台显示...解决办法
Draem+sspcab [anomaly detection: block]
776. String shift inclusion problem
Yolov5 improvement 5: improve the feature fusion network panet to bifpn
STM32 - external interrupt application (exti) (use cubemx to configure interrupts)
递归和迭代
芯华章宣布完成超2亿A轮融资,全面布局EDA2.0研发
ES6, deep copy, shallow copy
Anomaly detection summary: intensity_ based/Normalizing Flow
Summary of C language learning content
LeetCode练习3——回文数
[virtual machine _2]-hyper-v and vmware/virtualbox cannot coexist
2020年国内十大IC设计企业曝光!这五大产业挑战仍有待突破!
MKD [anomaly detection: knowledge disruption]
《Shortening passengers’ travel time A dynamic metro train scheduling approach using deep reinforcem》
hp proliant dl380从U盘启动按哪个键