当前位置:网站首页>Pytoch learning notes -- seresnet50 construction
Pytoch learning notes -- seresnet50 construction
2022-07-25 15:41:00 【whut_ L】
Catalog
4-- Use of examples SEResNet50 Implement datasets CIFAR10 classification
1--ResNet50 Introduce

analysis : The picture above shows ResNet50 Overall structure , except Input and Output Out of link , Also contains 5 A link :Stem Block link 、Stage1-4 Links and Subsequent Processing link .
1-1--Stem Block link

analysis :Stem Block Link of Input It's a three channel (C = 3, W = 224, H = 224) Image , First pass through Convolution operation (kernel_size = 7 x 7,stride = 2, Number of convolution kernels = 64 )、 Normalization operation 、RELU operation , after Maximum pooling operation obtain Output(C = 64, W = 56, H = 56), This link can be understood as four Stage Pretreatment before , The core code is as follows :
self.Stem = nn.Sequential(
nn.Conv2d(3, 64, kernel_size = 7, stride = 2, padding = 3, bias = False),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(kernel_size = 3, stride = 2, padding = 1)
)1-2--Stage link

The picture above shows 4 individual Stage Link network structure , Every Stage There are two network structures :Conv Block structure and Identity Block structure .
With Stage1 As an example, the differences between the two structures are analyzed :

Visible from above :Conv Block structure Than Identity Block structure One more on the right Conv and BN operation , Build two Block The core code is as follows :
## Import third-party library
import torch
from torch import nn
# build Conv Block and Identity Block Network structure
class Block(nn.Module):
def __init__(self, in_channels, filters, stride = 1, is_1x1conv = False):
super(Block, self).__init__()
# each Stage The output dimension of each block in the , namely channel(filter1 = filter2 = filter3 / 4)
filter1, filter2, filter3 = filters
self.is_1x1conv = is_1x1conv # Judge whether it is Conv Block
self.relu = nn.ReLU(inplace = True) # RELU operation
# The first piece , stride = 1(stage = 1) or stride = 2(stage = 2, 3, 4)
self.conv1 = nn.Sequential(
nn.Conv2d(in_channels, filter1, kernel_size = 1, stride = stride, bias = False),
nn.BatchNorm2d(filter1),
nn.ReLU()
)
# A small piece in the middle
self.conv2 = nn.Sequential(
nn.Conv2d(filter1, filter2, kernel_size=3, stride = 1, padding = 1, bias=False),
nn.BatchNorm2d(filter2),
nn.ReLU()
)
# The last piece , There is no need for ReLu operation
self.conv3 = nn.Sequential(
nn.Conv2d(filter2, filter3, kernel_size = 1, stride = 1, bias=False),
nn.BatchNorm2d(filter3),
)
# Conv Block Input of requires additional Conv and BN operation ( combination Conv Block Network diagram understanding )
if is_1x1conv:
self.shortcut = nn.Sequential(
nn.Conv2d(in_channels, filter3, kernel_size = 1, stride = stride, bias = False),
nn.BatchNorm2d(filter3)
)
def forward(self, x):
x_shortcut = x # The transmission value on the right in the network diagram
x1 = self.conv1(x) # Execute the first small Block operation
x1 = self.conv2(x1) # Execute the middle small Block operation
x1 = self.conv3(x1) # Execute the last small Block operation
if self.is_1x1conv: # Conv Block Make additional Conv and BN operation
x_shortcut = self.shortcut(x_shortcut)
x1 = x1 + x_shortcut # Add operation
x1 = self.relu(x1) # ReLU operation
return x1Detail analysis :
- Every Stage As the next Stage The input of ;
- Stage1 and Stage2-4 Of Conv Block In structure , The first one on the left and right Block Medium stride Different values .(Stage1: stride = 1; Stage2-4: stride = 2);
- Whether it's Conv Block still Identity Block, The last little Block Of channel Are the first and the middle small Block Of channel Four times the size of .
- Whether it's Conv Block still Identity Block, The last little Block No, RELU operation ( Compared with the previous two small Block)

1-3--ResNet50 Core code :
# build ResNet50
class Resnet(nn.Module):
def __init__(self, cfg):
super(Resnet, self).__init__()
classes = cfg['classes'] # Category of classification
num = cfg['num'] # ResNet50[3, 4, 6, 3];Conv Block and Identity Block The number of
# Stem Block
self.conv1 = nn.Sequential(
nn.Conv2d(3, 64, kernel_size = 7, stride = 2, padding = 3, bias = False),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(kernel_size = 3, stride = 2, padding = 1)
)
# Stage1
filters = (64, 64, 256) # channel
self.Stage1 = self._make_layer(in_channels = 64, filters = filters, num = num[0], stride = 1)
# Stage2
filters = (128, 128, 512) # channel
self.Stage2 = self._make_layer(in_channels = 256, filters = filters, num = num[1], stride = 2)
# Stage3
filters = (256, 256, 1024) # channel
self.Stage3 = self._make_layer(in_channels = 512, filters = filters, num = num[2], stride = 2)
# Stage4
filters = (512, 512, 2048) # channel
self.Stage4 = self._make_layer(in_channels = 1024, filters = filters, num = num[3], stride = 2)
# The average pooling
self.global_average_pool = nn.AdaptiveAvgPool2d((1, 1))
# Fully connected layer Here can be understood as four in the network Stage After Subsequent Processing link
self.fc = nn.Sequential(
nn.Linear(2048, classes)
)
# Single form Stage Network structure
def _make_layer(self, in_channels, filters, num, stride = 1):
layers = []
# Conv Block
block_1 = Block(in_channels, filters, stride = stride, is_1x1conv = True)
layers.append(block_1)
# Identity Block Structural superposition ; be based on [3, 4, 6, 3]
for i in range(1, num):
layers.append(Block(filters[2], filters, stride = 1, is_1x1conv = False))
# return Conv Block and Identity Block Set , To form a Stage Network structure
return nn.Sequential(*layers)
def forward(self, x):
# Stem Block link
x = self.conv1(x)
# Carry out four Stage link
x = self.Stage1(x)
x = self.Stage2(x)
x = self.Stage3(x)
x = self.Stage4(x)
# perform Subsequent Processing link
x = self.global_average_pool(x)
x = torch.flatten(x, 1)
x = self.fc(x)
return x2--SENet Introduce

analysis : The above figure is taken from the literature 《Squeeze-and-Excitation Networks》,SENet The intuitive understanding of the original image is all The channel is weighted , The following figure shows the main process of its weighting operation .

analysis : It can be seen from the above figure , Weight the original image , Mainly through Global average pooling 、 Fully connected linear layer 、ReLU operation 、 Fully connected linear layer as well as Sigmoid function Get the weight value of each channel , Re pass Scale( The product of ) operation Complete weighting . In the example , use The convolution layer replaces the full connection layer , Try to reduce the semantic loss of pictures , Calculation The core code of weighting matrix as follows :
# SENet( combination SENet Network diagram understanding )
self.se = nn.Sequential(
nn.AdaptiveAvgPool2d((1, 1)), # Global average pooling
nn.Conv2d(filter3, filter3 // 16, kernel_size=1), # 16 Express r,filter3//16 Express C/r, Here, the convolution layer is used instead of the full connection layer
nn.ReLU(),
nn.Conv2d(filter3 // 16, filter3, kernel_size=1),
nn.Sigmoid()
)3--SEResNet50 Introduce

analysis : The picture above will SENet Added to the Residual Module , the SENet Add module to ResNet in Conv Block and Identity Block Of The last little Block after . be based on SENet build Conv Block and Identity Block Of Core code as follows :
# Set up based on SENet Of Conv Block and Identity Block Network structure
class Block(nn.Module):
def __init__(self, in_channels, filters, stride = 1, is_1x1conv = False):
super(Block, self).__init__()
# each Stage The output dimension of each block in the , namely channel(filter1 = filter2 = filter3 / 4)
filter1, filter2, filter3 = filters
self.is_1x1conv = is_1x1conv # Judge whether it is Conv Block
self.relu = nn.ReLU(inplace = True) # RELU operation
# The first piece , stride = 1(stage = 1) or stride = 2(stage = 2, 3, 4)
self.conv1 = nn.Sequential(
nn.Conv2d(in_channels, filter1, kernel_size = 1, stride = stride, bias = False),
nn.BatchNorm2d(filter1),
nn.ReLU()
)
# A small piece in the middle
self.conv2 = nn.Sequential(
nn.Conv2d(filter1, filter2, kernel_size=3, stride = 1, padding = 1, bias=False),
nn.BatchNorm2d(filter2),
nn.ReLU()
)
# The last piece , There is no need for ReLu operation
self.conv3 = nn.Sequential(
nn.Conv2d(filter2, filter3, kernel_size = 1, stride = 1, bias=False),
nn.BatchNorm2d(filter3),
)
# Conv Block The input of requires additional convolution and normalization ( combination Conv Block Network diagram understanding )
if is_1x1conv:
self.shortcut = nn.Sequential(
nn.Conv2d(in_channels, filter3, kernel_size = 1, stride = stride, bias = False),
nn.BatchNorm2d(filter3)
)
# SENet( combination SENet Network diagram understanding )
self.se = nn.Sequential(
nn.AdaptiveAvgPool2d((1, 1)), # Global average pooling
nn.Conv2d(filter3, filter3 // 16, kernel_size=1), # 16 Express r,filter3//16 Express C/r, Here, the convolution layer is used instead of the full connection layer
nn.ReLU(),
nn.Conv2d(filter3 // 16, filter3, kernel_size=1),
nn.Sigmoid()
)
def forward(self, x):
x_shortcut = x
x1 = self.conv1(x) # Execute the first Block operation
x1 = self.conv2(x1) # Execute intermediate Block operation
x1 = self.conv3(x1) # Execute the last Block operation
x2 = self.se(x1) # utilize SENet Calculate the weight of each channel
x1 = x1 * x2 # Weight the original channel
if self.is_1x1conv: # Conv Block Perform additional convolution normalization
x_shortcut = self.shortcut(x_shortcut)
x1 = x1 + x_shortcut # Add operation
x1 = self.relu(x1) # ReLU operation
return x1
4-- Use of examples SEResNet50 Implement datasets CIFAR10 classification
Direct running code ( Detailed notes ):
## Import third-party library
from torch import nn
import time
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import torch.optim as optim
# Set up based on SENet Of Conv Block and Identity Block Network structure
class Block(nn.Module):
def __init__(self, in_channels, filters, stride = 1, is_1x1conv = False):
super(Block, self).__init__()
# each Stage The output dimension of each block in the , namely channel(filter1 = filter2 = filter3 / 4)
filter1, filter2, filter3 = filters
self.is_1x1conv = is_1x1conv # Judge whether it is Conv Block
self.relu = nn.ReLU(inplace = True) # RELU operation
# The first piece , stride = 1(stage = 1) or stride = 2(stage = 2, 3, 4)
self.conv1 = nn.Sequential(
nn.Conv2d(in_channels, filter1, kernel_size = 1, stride = stride, bias = False),
nn.BatchNorm2d(filter1),
nn.ReLU()
)
# A small piece in the middle
self.conv2 = nn.Sequential(
nn.Conv2d(filter1, filter2, kernel_size=3, stride = 1, padding = 1, bias=False),
nn.BatchNorm2d(filter2),
nn.ReLU()
)
# The last piece , There is no need for ReLu operation
self.conv3 = nn.Sequential(
nn.Conv2d(filter2, filter3, kernel_size = 1, stride = 1, bias=False),
nn.BatchNorm2d(filter3),
)
# Conv Block The input of requires additional convolution and normalization ( combination Conv Block Network diagram understanding )
if is_1x1conv:
self.shortcut = nn.Sequential(
nn.Conv2d(in_channels, filter3, kernel_size = 1, stride = stride, bias = False),
nn.BatchNorm2d(filter3)
)
# SENet( combination SENet Network diagram understanding )
self.se = nn.Sequential(
nn.AdaptiveAvgPool2d((1, 1)), # Global average pooling
nn.Conv2d(filter3, filter3 // 16, kernel_size=1), # 16 Express r,filter3//16 Express C/r, Here, the convolution layer is used instead of the full connection layer
nn.ReLU(),
nn.Conv2d(filter3 // 16, filter3, kernel_size=1),
nn.Sigmoid()
)
def forward(self, x):
x_shortcut = x
x1 = self.conv1(x) # Execute the first Block operation
x1 = self.conv2(x1) # Execute intermediate Block operation
x1 = self.conv3(x1) # Execute the last Block operation
x2 = self.se(x1) # utilize SENet Calculate the weight of each channel
x1 = x1 * x2 # Weight the original channel
if self.is_1x1conv: # Conv Block Perform additional convolution normalization
x_shortcut = self.shortcut(x_shortcut)
x1 = x1 + x_shortcut # Add operation
x1 = self.relu(x1) # ReLU operation
return x1
# build SEResNet50
class SEResnet(nn.Module):
def __init__(self, cfg):
super(SEResnet, self).__init__()
classes = cfg['classes'] # Category of classification
num = cfg['num'] # ResNet50[3, 4, 6, 3];Conv Block and Identity Block The number of
# Stem Block
self.conv1 = nn.Sequential(
nn.Conv2d(3, 64, kernel_size = 7, stride = 2, padding = 3, bias = False),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(kernel_size = 3, stride = 2, padding = 1)
)
# Stage1
filters = (64, 64, 256) # channel
self.Stage1 = self._make_layer(in_channels = 64, filters = filters, num = num[0], stride = 1)
# Stage2
filters = (128, 128, 512) # channel
self.Stage2 = self._make_layer(in_channels = 256, filters = filters, num = num[1], stride = 2)
# Stage3
filters = (256, 256, 1024) # channel
self.Stage3 = self._make_layer(in_channels = 512, filters = filters, num = num[2], stride = 2)
# Stage4
filters = (512, 512, 2048) # channel
self.Stage4 = self._make_layer(in_channels = 1024, filters = filters, num = num[3], stride = 2)
# Adaptive average pooling ,(1, 1) Indicates the size of the output (H x W)
self.global_average_pool = nn.AdaptiveAvgPool2d((1, 1))
# Fully connected layer Here can be understood as four in the network Stage After Subsequent Processing link
self.fc = nn.Sequential(
nn.Linear(2048, classes)
)
# Single form Stage Network structure
def _make_layer(self, in_channels, filters, num, stride = 1):
layers = []
# Conv Block
block_1 = Block(in_channels, filters, stride = stride, is_1x1conv = True)
layers.append(block_1)
# Identity Block Structural superposition ; be based on [3, 4, 6, 3]
for i in range(1, num):
layers.append(Block(filters[2], filters, stride = 1, is_1x1conv = False))
# return Conv Block and Identity Block Set , To form a Stage Network structure
return nn.Sequential(*layers)
def forward(self, x):
# Stem Block link
x = self.conv1(x)
# Carry out four Stage link
x = self.Stage1(x)
x = self.Stage2(x)
x = self.Stage3(x)
x = self.Stage4(x)
# perform Subsequent Processing link
x = self.global_average_pool(x)
x = torch.flatten(x, 1)
x = self.fc(x)
return x
# SeResNet50 Parameters of ( Note that calling this function will indirectly call SEResnet, A separate function is written here to facilitate modification to other ResNet The structure of the network )
def SeResNet50():
cfg = {
'num':(3, 4, 6, 3), # ResNet50, four Stage in Block The number of ( among Conv Block by 1 individual , The rest are increased Identity Block)
'classes': (10) # Number of data set classifications
}
return SEResnet(cfg) # call SEResnet The Internet
## Import dataset
def load_dataset(batch_size):
# Download training set
train_set = torchvision.datasets.CIFAR10(
root = "data/cifar-10", train = True,
download = True, transform = transforms.ToTensor()
)
# Download test set
test_set = torchvision.datasets.CIFAR10(
root = "data/cifar-10", train = False,
download = True, transform = transforms.ToTensor()
)
train_iter = torch.utils.data.DataLoader(
train_set, batch_size = batch_size, shuffle = True, num_workers = 4
)
test_iter = torch.utils.data.DataLoader(
test_set, batch_size = batch_size, shuffle = True, num_workers = 4
)
return train_iter, test_iter
# Training models
def train(net, train_iter, criterion, optimizer, num_epochs, device, num_print, lr_scheduler = None, test_iter = None):
net.train() # Training mode
record_train = list() # Record each Epoch Accuracy of lower training set
record_test = list() # Record each Epoch The accuracy of the next test set
for epoch in range(num_epochs):
print("========== epoch: [{}/{}] ==========".format(epoch + 1, num_epochs))
total, correct, train_loss = 0, 0, 0
start = time.time()
for i, (X, y) in enumerate(train_iter):
X, y = X.to(device), y.to(device) # GPU or CPU function
output = net(X) # Calculate the output
loss = criterion(output, y) # Calculate the loss
optimizer.zero_grad() # Gradient set 0
loss.backward() # Calculate the gradient
optimizer.step() # Optimization parameters
train_loss += loss.item() # Accumulated losses
total += y.size(0) # Cumulative total number of samples
correct += (output.argmax(dim=1) == y).sum().item() # The number of samples with correct cumulative prediction
train_acc = 100.0 * correct / total # Computational accuracy
if (i + 1) % num_print == 0:
print("step: [{}/{}], train_loss: {:.3f} | train_acc: {:6.3f}% | lr: {:.6f}" \
.format(i + 1, len(train_iter), train_loss / (i + 1), \
train_acc, get_cur_lr(optimizer)))
# Adjust the learning rate of gradient descent algorithm
if lr_scheduler is not None:
lr_scheduler.step()
# Output training time
print("--- cost time: {:.4f}s ---".format(time.time() - start))
if test_iter is not None: # Judge whether the test set is empty ( Notice here that test function )
record_test.append(test(net, test_iter, criterion, device)) # Every training one Epoch Model , Use the test set to test the accuracy of the model
record_train.append(train_acc)
# Return to each Epoch The accuracy of test set and training set
return record_train, record_test
# Validate the model
def test(net, test_iter, criterion, device):
total, correct = 0, 0
net.eval() # Test mode
with torch.no_grad(): # Don't calculate the gradient
print("*************** test ***************")
for X, y in test_iter:
X, y = X.to(device), y.to(device) # CPU or GPU function
output = net(X) # Calculate the output
loss = criterion(output, y) # Calculate the loss
total += y.size(0) # Calculate the number of test samples
correct += (output.argmax(dim=1) == y).sum().item() # Calculate the number of samples that the test set predicts accurately
test_acc = 100.0 * correct / total # Test set accuracy
# Loss of output test set
print("test_loss: {:.3f} | test_acc: {:6.3f}%" \
.format(loss.item(), test_acc))
print("************************************\n")
# Training mode ( Because here is because every time I pass by Epoch Just use the test set once , After using the test set , Go to the next Epoch Put the model back into training mode before )
net.train()
return test_acc
# Return to learning rate lr Function of
def get_cur_lr(optimizer):
for param_group in optimizer.param_groups:
return param_group['lr']
# Draw each Epoch The accuracy of test set and training set
def learning_curve(record_train, record_test=None):
plt.style.use("ggplot")
plt.plot(range(1, len(record_train) + 1), record_train, label = "train acc")
if record_test is not None:
plt.plot(range(1, len(record_test) + 1), record_test, label = "test acc")
plt.legend(loc=4)
plt.title("learning curve")
plt.xticks(range(0, len(record_train) + 1, 5))
plt.yticks(range(0, 101, 5))
plt.xlabel("epoch")
plt.ylabel("accuracy")
plt.show()
BATCH_SIZE = 128 # Batch size
NUM_EPOCHS = 12 # Epoch size
NUM_CLASSES = 10 # Number of categories
LEARNING_RATE = 0.01 # Gradient descent learning rate
MOMENTUM = 0.9 # Impulse size
WEIGHT_DECAY = 0.0005 # Weight attenuation coefficient
NUM_PRINT = 100
DEVICE = "cuda" if torch.cuda.is_available() else "cpu" # GPU or CPU function
def main():
net = SeResNet50()
net = net.to(DEVICE) # GPU or CPU function
train_iter, test_iter = load_dataset(BATCH_SIZE) # Import training set and test set
criterion = nn.CrossEntropyLoss() # Loss calculator
# Optimizer
optimizer = optim.SGD(
net.parameters(),
lr = LEARNING_RATE,
momentum = MOMENTUM,
weight_decay = WEIGHT_DECAY,
nesterov = True
)
# Adjust the learning rate (step_size: Every training step_size individual epoch, Update parameters once ; gamma: to update lr Multiplication factor of )
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size = 5, gamma = 0.1)
record_train, record_test = train(net, train_iter, criterion, optimizer, NUM_EPOCHS, DEVICE, NUM_PRINT,
lr_scheduler, test_iter)
learning_curve(record_train, record_test) # Draw the accuracy curve
main()

5-- Reference resources
SENet Reference resources
边栏推荐
- Xcode added mobileprovision certificate file error: Xcode encoded an error
- The difference between VaR, let and Const
- JS URLEncode function
- 《图书馆管理系统——“借书还书”模块》项目研发阶段性总结
- 带你创建你的第一个C#程序(建议收藏)
- Are you ready to break away from the "involution circle"?
- GAMES101复习:线性代数
- P4552 differential
- LeetCode - 677 键值映射(设计)*
- Understanding the difference between wait() and sleep()
猜你喜欢

Leetcode - 641 design cycle double ended queue (Design)*

BPSK调制系统MATLAB仿真实现(1)

LeetCode - 641 设计循环双端队列(设计)*

CVPR 2022 | in depth study of batch normalized estimation offset in network

Pat grade a 1152 Google recruitment (20 points)

LeetCode - 379 电话目录管理系统(设计)

Are you ready to break away from the "involution circle"?

Games101 review: linear algebra

Take you to create your first C program (recommended Collection)

IDEA—点击文件代码与目录自动同步对应
随机推荐
Take you to create your first C program (recommended Collection)
2019 Zhejiang race c-wrong arrangement, greedy
Flink-1.13.6版本的 Flink sql以yarn session 模式运行,怎么禁用托管
Games101 review: Transformation
2019陕西省省赛J-位运算+贪心
组件化和模块化
No tracked branch configured for branch xxx or the branch doesn‘t exist. To make your branch trac
PageHelper does not take effect, and SQL does not automatically add limit
ZOJ - 4114 Flipping Game-dp,合理状态表示
2021上海市赛-H-二分答案
为什么PrepareStatement性能更好更安全?
哪里有搭建flink cdc抽mysql数的demo?
带你详细认识JS基础语法(建议收藏)
Google Blog: training general agents with multi game decision transformer
Redis分布式锁,没它真不行
PAT甲级题目目录
wait()和sleep()的区别理解
< stack simulation recursion >
Solve the vender-base.66c6fc1c0b393478adf7.js:6 typeerror: cannot read property 'validate' of undefined problem
Flex layout