当前位置:网站首页>Article reproduction: super resolution network fsrcnn
Article reproduction: super resolution network fsrcnn
2022-07-28 01:38:00 【GIS and climate】
SRCNN Through three convolution layers to complete the super division of the image , Feature extraction 、 Feature mapping and image reconstruction .
But its input needs to be pre interpolated into the target size , There are certain defects in speed , It is difficult to apply to real-time research , therefore Dong wait forsomeone (SRCNN The first author of ) Improved again SRCNN, Hence the FSRCNN, Among them F representative Fast.
Summary of the original
As a successful deep model applied in image super-resolution (SR), the Super-Resolution Convolutional Neural Network (SRCNN) has demonstrated superior performance to the previous hand- crafted models either in speed and restoration quality. However, the high computational cost still hinders it from practical usage that demands real-time performance (24 fps). In this paper, we aim at accelerating the current SRCNN, and propose a compact hourglass-shape CNN struc- ture for faster and better SR. We re-design the SRCNN structure mainly in three aspects. First, we introduce a deconvolution layer at the end of the network, then the mapping is learned directly from the original low-resolution image (without interpolation) to the high-resolution one. Second, we reformulate the mapping layer by shrinking the input feature dimension before mapping and expanding back afterwards. Third, we adopt smaller filter sizes but more mapping layers. The proposed model achieves a speed up of more than 40 times with even superior restora- tion quality. Further, we present the parameter settings that can achieve real-time performance on a generic CPU while still maintaining good performance. A corresponding transfer strategy is also proposed for fast training and testing across different upscaling factors.
Its new network architecture is :

For the original SRCNN, Its improvements mainly include :
At the end of the network, a deconvolution layer , That is, it is used for sampling , In this way, the input image does not need to be interpolated first , You can input LR Images , Then get HR Images ; For the middle feature mapping layer , First shrink the feature dimension , Then expand it ; With a smaller convolution kernel 、 More feature mapping layers ; The activation function uses ReLU, That is to say PReLU;
The model code
Talked about before , use Pytorch Writing code is building blocks , Just look at the network structure and write it a little , The following line is clearer ( From reference link 【2】):

Its structure is like an hourglass , It is symmetrical on the whole , Both ends are thick , Fine in the middle .
Interestingly, the new structure looks like an hourglass, which is symmetri- cal on the whole, thick at the ends, and thin in the middle.
class FSRCNN(nn.Module):
def __init__(self, inchannels):
super(FSRCNN, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(in_channels=inchannels, out_channels=64, kernel_size=5, stride=1, padding=2, padding_mode='replicate'),
nn.PReLU()
)
self.shrinking = nn.Sequential(
nn.Conv2d(in_channels=64,out_channels=32,kernel_size=1, stride=1, padding=0, padding_mode='replicate'),
nn.PReLU()
)
self.mapping = nn.Sequential(
nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, stride =1, padding=1, padding_mode='replicate'),
nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, stride =1, padding=1, padding_mode='replicate'),
nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, stride =1, padding=1, padding_mode='replicate'),
nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, stride =1, padding=1, padding_mode='replicate'),
nn.PReLU()
)
self.expanding = nn.Sequential(
nn.Conv2d(in_channels=32, out_channels=64, kernel_size=1, stride=1, padding=0),
nn.PReLU()
)
self.deconv = nn.Sequential(
nn.ConvTranspose2d(in_channels=64, out_channels=inchannels, kernel_size=9, stride=3, padding=4, padding_mode='replicate')
)
def forward(self, x):
x = self.features(x)
x = self.shrinking(x)
x = self.mapping(x)
x = self.expanding(x)
x = self.deconv(x)
return x
Code interpretation
On the whole ,FSRCNN It is divided into the above 5 Parts of :
features Used from the original LR Extract features from images , For the characteristics of the image , Generally, it is extracted by convolution . Notice a sensitive parameter here ( The original text says sensitive variable, I understand that parameters can be changed according to your own needs ) Namely filter The number of , That is, the convolution output of this layer channel Number . shrinking According to the original author, the function of this part is to reduce the amount of calculation , Personally, I think it's compression Latitude of characteristics , For example, in my code above, I put 64 The feature of dimension is compressed to 32 weft , Naturally, the amount of calculation is reduced . mapping This layer is exactly called nonlinear mapping ,Non-linear Mapping, It is through Roll roll , Nonlinear transformation of features . According to the author , This part is the most important part for the model results . The following article said this . expanding This layer is to restore the compressed feature dimension . deconvolution deconvolution , This layer is to restore the processed features above to HR Images .
A small summary
There are several parameters in the model that can be changed according to your own research , Include :
Number of feature maps : It is the output of the first convolution layer channel Count Compressed feature dimension : How many dimensions are compressed from the above feature dimension ? Number of nonlinear mapping layers : Is in the mapping That module should have several convolution layers ( Note that this does not change channel Count ) The multiple of deconvolution : Is how many times you want to enlarge the image
The first four modules of the whole model generally play a role from low score LR The role of extracting features from the image , Then the last module is to upsample the image , You can modify the last module according to your different up sampling multiple requirements , So the author says it is better to transfer learning . The original author said padding The way to the final result is negligible , Therefore, according to the size of convolution kernel, the corresponding 0 fill , But I think this step can be adjusted according to my own research padding The way , Especially if you pay more attention to pixel values . In order to train faster , The original author also used 2 Step training , First use 91-image dataset Training , When it converges (satureated?) Then add General-100 The dataset goes on fine-tuning. For convolution layer lr=10e-3, The final deconvolution layer is lr=10e-4. Several parameters used in the above code (sensitive variable) Inconsistent with the original .
reflection
The so-called feature extraction , What exactly is extracted ? The so-called nonlinear mapping , What has been done ?
Doubt and attention
The nonlinear mapping layer in the structure in the original text should be m individual conv The layer is followed by an activation function PReLU, But I think many people on the Internet write about everyone conv There is one behind each layer PReLU, This difference needs to be tested ...
2. The last parameter of deconvolution layer should be calculated carefully according to the multiple of output .

Reference resources
【1】DONG C, LOY C C, TANG X. Accelerating the Super-Resolution Convolutional Neural Network[C]//Computer Vision – ECCV 2016.Springer International Publishing,2016:391-407. 10.1007/978-3-319-46475-6_25.
【2】https://towardsdatascience.com/review-fsrcnn-super-resolution-80ca2ee14da4
【3】https://github.com/Lornatang/FSRCNN-PyTorch
边栏推荐
- 8000 word explanation of OBSA principle and application practice
- 糟糕程序员的20个坏习惯
- qt 设置图标
- Realize ABCD letter increment
- Shaanxi Yuanjie semiconductor, a laser chip manufacturer, was invested by 8 investment institutions including GF Securities and CITIC Securities
- Data problems can also be found if there is a space at the end of the field value of MySQL query criteria
- idea常用的快捷键汇总
- 2022/07/27 learning notes (Day17) code blocks and internal classes
- Flutter 通话界面UI
- Lua get started quickly
猜你喜欢

内容bypass分享

MATLAB 44种动漫渐变色绘图程序

LeetCode 2347. 最好的扑克手牌

Fluent call interface UI

The cooperation between starfish OS and metabell is just the beginning

还在用WIFI你就OUT了:LI-FI更牛!!!

Let's move forward together, the 10th anniversary of Google play!

Leetcode 2341. How many pairs can an array form

Redefine analysis - release of eventbridge real-time event analysis platform

Baidu PaddlePaddle easydl: when AI enters the factory, "small bearing" can also turn "big industry"
随机推荐
How the test architects of bat factories interpret various disputes of the test platform
Login function implementation
URDF integrated gazebo
Codeforces summer training weekly (7.14~7.20)
Oxygen temperature and humidity module
Thoroughly understand kubernetes scheduling framework and plug-ins
实现OCR语言识别Demo(二)- 图片及识别内容的展现和交互
[C language] file operation
Software test interview question: how to prepare test data? How to prevent data pollution?
Dart 代码注释和文档编写规范
The total investment is nearly 1.6 billion yuan! Qianzhao optoelectronics VCSEL and high-end LED chip projects officially started
String
【分布式开发】之 CAP 原则
Icml2022 | online decision transformer
梳理 SQL 性能优化,收藏经典!
Qlib教程——基于源码(二)本地数据保存与加载
8000字讲透OBSA原理与应用实践
Lua get started quickly
Leetcode 2341. How many pairs can an array form
Nokia announces cooperation with Broadcom to develop 5g chip