当前位置:网站首页>文章复现:超分辨率网络FSRCNN
文章复现:超分辨率网络FSRCNN
2022-07-27 23:31:00 【GIS与Climate】
SRCNN通过三个卷积层来完成对图像的超分,也就是特征提取、特征映射和图像重建。
但是其输入需要预先插值为目标尺寸,在速度上有一定的缺陷,难以应用到实时的研究中,因此Dong等人(SRCNN的第一作者)又改进了SRCNN,于是有了FSRCNN,其中的F代表Fast。
原文摘要
As a successful deep model applied in image super-resolution (SR), the Super-Resolution Convolutional Neural Network (SRCNN) has demonstrated superior performance to the previous hand- crafted models either in speed and restoration quality. However, the high computational cost still hinders it from practical usage that demands real-time performance (24 fps). In this paper, we aim at accelerating the current SRCNN, and propose a compact hourglass-shape CNN struc- ture for faster and better SR. We re-design the SRCNN structure mainly in three aspects. First, we introduce a deconvolution layer at the end of the network, then the mapping is learned directly from the original low-resolution image (without interpolation) to the high-resolution one. Second, we reformulate the mapping layer by shrinking the input feature dimension before mapping and expanding back afterwards. Third, we adopt smaller filter sizes but more mapping layers. The proposed model achieves a speed up of more than 40 times with even superior restora- tion quality. Further, we present the parameter settings that can achieve real-time performance on a generic CPU while still maintaining good performance. A corresponding transfer strategy is also proposed for fast training and testing across different upscaling factors.
其新的网络架构为:

对于原来的SRCNN,其改进主要有:
在网络的最后加上了一个deconvolution层,也就是用来上采样,这样子输入的图像就不用先进行插值操作了,可以直接输入LR图像,然后得到HR图像; 对于中间的特征映射层,采用先收缩特征维,然后再展开的方式来进行; 用更小的卷积核、更多的特征映射层; 激活函数使用带参数的ReLU,也就是PReLU;
模型代码
之前讲过,用Pytorch写代码就是搭积木,看着网络结构一点点写就好,下面这行图更清晰点(来自参考链接【2】):

其结构就像是一个沙漏,整体上是对称的,两端比较粗,中间比较细。
Interestingly, the new structure looks like an hourglass, which is symmetri- cal on the whole, thick at the ends, and thin in the middle.
class FSRCNN(nn.Module):
def __init__(self, inchannels):
super(FSRCNN, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(in_channels=inchannels, out_channels=64, kernel_size=5, stride=1, padding=2, padding_mode='replicate'),
nn.PReLU()
)
self.shrinking = nn.Sequential(
nn.Conv2d(in_channels=64,out_channels=32,kernel_size=1, stride=1, padding=0, padding_mode='replicate'),
nn.PReLU()
)
self.mapping = nn.Sequential(
nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, stride =1, padding=1, padding_mode='replicate'),
nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, stride =1, padding=1, padding_mode='replicate'),
nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, stride =1, padding=1, padding_mode='replicate'),
nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, stride =1, padding=1, padding_mode='replicate'),
nn.PReLU()
)
self.expanding = nn.Sequential(
nn.Conv2d(in_channels=32, out_channels=64, kernel_size=1, stride=1, padding=0),
nn.PReLU()
)
self.deconv = nn.Sequential(
nn.ConvTranspose2d(in_channels=64, out_channels=inchannels, kernel_size=9, stride=3, padding=4, padding_mode='replicate')
)
def forward(self, x):
x = self.features(x)
x = self.shrinking(x)
x = self.mapping(x)
x = self.expanding(x)
x = self.deconv(x)
return x
代码解释
整体上来说,FSRCNN就是分为上面5个部分:
features 用于从原始的LR图像中提取特征,对于图像的特征来说,一般都是通过卷积来提取的。这里注意一个敏感参数(原文说的是sensitive variable,我理解就是可以根据自己需求改变的参数)就是filter的个数,也就是这一层卷积输出的channel数量。 shrinking 这一部分的作用按照原作者说的意思就是减少计算量,个人觉得就是压缩 特征的纬度,比如上面我的代码中通过这一层把64维的特征压缩到32纬,自然减少了计算量。 mapping 这一层准确的说叫做非线性映射,Non-linear Mapping,就是通过 卷卷卷,对特征进行非线性变换。按照作者的说法,这一部分是对模型结果来说最重要的一部分。后面出文说这个。 expanding 这一层就是把上面压缩的特征维还原。 deconvolution 反卷积,这一层就是把上面经过处理的特征还原成HR图像。
小总结
模型里面有几个参数是可以自己根据自己的研究改变的,包括:
特征图的数量:就是第一个卷积层输出的channel数 压缩的特征维度:从上面的特征维压缩到多少维? 非线性映射层的数量:就是在mapping那一个模块要有几个卷积层(注意这里不改变channel数) 反卷积的倍数:就是要将图像放大多少倍
整个模型前面四个模块总体上扮演了从低分LR图像中提取特征的角色,然后最后一个模块才是对图像进行上采样,可以根据自己不同的上采样倍数需求进行修改最后一个模块,所以作者说比较好进行迁移学习。 原作者说padding的方式对最后的结果微乎其微,所以根据卷积核的大小都进行了相应的0填充,但是个人觉得这一步可以自己根据自己的研究来调整padding的方式,尤其是比较关注像元值的话。 为了训练更快,原作者还用了2步训练法,就是先用91-image dataset训练,等收敛了(satureated?)之后再加入General-100数据集进行fine-tuning。 对于卷积层使用的lr=10e-3,最后的反卷积层是lr=10e-4。 上面代码用的几个参数(sensitive variable)跟原文不一致。
思考
所谓的特征提取,到底提取了什么? 所谓的非线性映射,到底做了啥?
存疑与注意点
原文中的结构中非线性映射层是说的应该是m个conv层后面跟上一个激活函数PReLU,不过我看网络上很多人写的都是每一个conv层后面都跟上一个PReLU,这个区别待试验。。。
2. 最后一个反卷积层的参数要自己根据输出的倍数仔细算一下。

参考
【1】DONG C, LOY C C, TANG X. Accelerating the Super-Resolution Convolutional Neural Network[C]//Computer Vision – ECCV 2016.Springer International Publishing,2016:391-407. 10.1007/978-3-319-46475-6_25.
【2】https://towardsdatascience.com/review-fsrcnn-super-resolution-80ca2ee14da4
【3】https://github.com/Lornatang/FSRCNN-PyTorch
边栏推荐
- Storage practices for high-performance computing scenarios, see here
- Flutter--密码登录注册界面
- JS global function method module exports exports
- 字节月薪28K,分享一波我的自动化测试经验....
- Basic learning of cesium
- LeetCode 2341. 数组能形成多少数对
- Day 013 一维数组练习
- Wu xiongang sent an internal letter: arm's allegations are unwarranted, and no damage is allowed to the existing achievements!
- S-RPN: Sampling-balanced region proposal network for small crop pest detection
- Shaanxi Yuanjie semiconductor, a laser chip manufacturer, was invested by 8 investment institutions including GF Securities and CITIC Securities
猜你喜欢

Leetcode 2347. the best poker hand

ICML2022 | 在线决策Transformer

Can anime characters become "real people"? Paddegan helps you find the TA of "tear man"
![[C language] file operation](/img/6e/b8f3466ca0a5f7424afcab561124af.png)
[C language] file operation
![[game] Nintendo Nintendo switch ultra detailed purchase / use guide and precautions (continuous update according to your own use...)](/img/7e/9e0e17e2ea8b8679ad7e1750a8b6d1.png)
[game] Nintendo Nintendo switch ultra detailed purchase / use guide and precautions (continuous update according to your own use...)

I want to get 20K after 3 years of experience, but I haven't got it for half a month?

Unity Shader入门精要学习——基础纹理

From functional testing to automated testing, my monthly salary has exceeded 30k+, and I have 6 years of testing experience.

Flutter--密码登录注册界面

【游戏】任天堂Nintendo Switch超详细购买/使用指南以及注意事项(根据自己使用持续更新中...)
随机推荐
软件测试面试题:如何发现数据库的相关问题?
Anfulai embedded weekly report no. 275: 2022.07.18--2022.07.24
Data problems can also be found if there is a space at the end of the field value of MySQL query criteria
The cooperation between starfish OS and metabell is just the beginning
20 bad habits of bad programmers
登录功能实现
Lecture 16 of project practice: using the open close principle to realize the commodity price rule engine
LeetCode 2347. 最好的扑克手牌
Thoroughly understand kubernetes scheduling framework and plug-ins
Lua get started quickly
Shutter -- password login registration interface
BAT大厂测试架构师如何解读测试平台的各种争议
Leetcode 2347. the best poker hand
Rviz 使用Arbotix控制机器人运动
In April, global smartphone shipments fell 41% year-on-year, and Huawei surpassed Samsung to become the world's first for the first time
Learn how Baidu PaddlePaddle easydl realizes automatic animal recognition in aquarium
Harmonyos 3 was officially released: Hongmeng mobile phones are smooth and safe, and Hongmeng terminals are often used
Redefine analysis - release of eventbridge real-time event analysis platform
Token is used in nodejs
Wentai technology acquired the remaining equity of ANSYS semiconductor and obtained unconditional approval