当前位置：网站首页>Mxnet implementation of densenet (dense connection network)

Mxnet implementation of densenet (dense connection network)

2022-07-25 13:40:00 【Yinque Guangqian】

Address of thesis ：Densely Connected Convolutional Networks

DenseNet Actually, it's the same as the one in front ResNet It's very similar , We know ResNet The gradient of can be directly passed through the identity function ( The output before activating the function is added to the previous cross layer one ) Flow from the back layer to the front layer . But by Sum up Combination , It may hinder the flow of information in the network . therefore DenseNet Made improvements , in other words , The input of each layer will cross layers to each subsequent layer , In other words, each subsequent layer will have direct input from each previous layer , Then they are not added , But in The channel dimension makes a connection . It's easy to understand when we look at the picture intuitively , From the diagram, we can know the model of any layer , Their connection number can be expressed as L(L+1)/2,L Layer number , such as 3 layer , The number of connections is 6,4 The number of connections in the layer is 10; Traditionally, the number of layers is the number of connections .

Because each layer is closely connected with other layers , So for such a model , We call it “ Densely connected network ” Or call “ Dense convolution network ”.
It is also mentioned in the paper “ bottleneck ” Design （ This is in ResNet In the same ） about DenseNet The model is also very effective , Is in the 3x3 Between convolutions, a 1x1 Convolution of , Such a model is called DenseNet-B.
To further improve the compactness of the model , We can reduce the number of characteristic graphs of the transition layer . For example, dense blocks contain m A feature map , Then let the lower transition layer generate θm Output characteristic graphs , among 0<θ≤1,θ For the compression factor . When θ=1, The number of characteristic diagrams across the transition layer remains unchanged , be called DenseNet-C.
If you use both a bottleneck layer and a θ<1 The transition layer , be called DenseNet-BC. about DenseNet-BC Model of , There are very few parameters , And the performance is very good ,0.8M The parameter of is equal to 10.2M Parametric 1001 layer （ Pre activation ）ResNet Quite accurate .
A variety of data sets are compared , Especially the closest ResNet Comparison , And you can see that DenseNets Fewer parameters are used , And achieve a lower error rate . Here's the picture ：

For the architecture diagram of the entire dense network , as follows ：

Building dense blocks

import d2lzh as d2l
from mxnet import gluon,init,nd
from mxnet.gluon import nn

#ResNet Improved convolution block 
#BN--ReLU--3x3 Convolution 
def conv_block(num_channels):
    blk=nn.Sequential()
    blk.add(nn.BatchNorm(),nn.Activation('relu'),nn.Conv2D(num_channels,kernel_size=3,padding=1))
    return blk


# Dense blocks 
# Multiple conv_block form , Each block uses the same number of channels 
# In the forward calculation , Connect the input and output of each block in the channel dimension （ That is, the current block will be linked with all the previous blocks ）
class DenseBlock(nn.Block):
    def __init__(self,num_convs,num_channels,**kwargs):
        super(DenseBlock,self).__init__(**kwargs)
        self.net=nn.Sequential()
        for _ in range(num_convs):
            self.net.add(conv_block(num_channels))
    def forward(self,X):
        for blk in self.net:
            Y=blk(X)
            X=nd.concat(X,Y,dim=1)
        return X
    
# Observe the shape change , Especially the number of channels 
blk=DenseBlock(4,10)# The number of channels is 10 Of 4 Convolution blocks 
blk.initialize()
X=nd.random.uniform(shape=(4,5,22,22))
XX=blk(X)
print(XX.shape)#4*10+5=45
#(4, 45, 22, 22)

We can see that the number of channels has increased , If too much, it will make the model complex , Here we use transition layer to deal with , Use one 1x1 Convolution layer to reduce the number of channels , And use steps of 2 The average pool layer reduces the width and height by half , So as to further reduce the complexity of the model .

Transition layer

def transition_block(num_channels):
    blk=nn.Sequential()
    blk.add(nn.BatchNorm(),nn.Activation('relu'),nn.Conv2D(num_channels,kernel_size=1),
            nn.AvgPool2D(pool_size=2,strides=2))
    return blk
blk=transition_block(10)
blk.initialize()
print(blk(XX).shape)
#(4, 10, 11, 11)

DenseNet Model construction and training

#DenseNet Model 
net=nn.Sequential()
net.add(nn.Conv2D(64,kernel_size=7,strides=2,padding=3),
        nn.BatchNorm(),nn.Activation('relu'),
        nn.MaxPool2D(pool_size=3,strides=2,padding=1))

#num_channels Is the current number of channels , Later, it will be halved through the transition layer ,growth_rate The growth rate is the number of channels of convolution blocks in dense blocks 
num_channels,growth_rate=64,32
num_convs_in_dense_blocks=[4,4,4,4]#4 A dense block , In each dense block 4 Convolution layers 
for i,num_convs in enumerate(num_convs_in_dense_blocks):
    net.add(DenseBlock(num_convs,growth_rate))
    # Number of output channels of the last dense block 
    num_channels+=num_convs*growth_rate
    # A transition layer with half the number of channels is added between dense blocks 
    if i!=len(num_convs_in_dense_blocks)-1:
        num_channels //= 2
        net.add(transition_block(num_channels))

# Finally, connect the global pooling layer and the full connection layer 
net.add(nn.BatchNorm(),nn.Activation('relu'),nn.GlobalAvgPool2D(),nn.Dense(10))

# Training models , Because the model is deep , Wide and high 224 Reduced to 48 To simplify the calculation , Otherwise, a memory overflow error is reported 
lr,num_epochs,batch_size,ctx=0.1,5,256,d2l.try_gpu()
net.initialize(force_reinit=True,ctx=ctx,init=init.Xavier())
trainer=gluon.Trainer(net.collect_params(),'sgd',{'learning_rate':lr})
train_iter,test_iter=d2l.load_data_fashion_mnist(batch_size,resize=48)
d2l.train_ch5(net,train_iter,test_iter,batch_size,trainer,ctx,num_epochs)


'''
epoch 1, loss 0.4878, train acc 0.821, test acc 0.871, time 35.3 sec
epoch 2, loss 0.3063, train acc 0.885, test acc 0.862, time 32.2 sec
epoch 3, loss 0.2618, train acc 0.902, test acc 0.865, time 32.2 sec
epoch 4, loss 0.2367, train acc 0.911, test acc 0.909, time 31.9 sec
epoch 5, loss 0.2146, train acc 0.919, test acc 0.905, time 31.8 sec
'''

原网站

版权声明
本文为[Yinque Guangqian]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/206/202207251336381029.html