当前位置：网站首页>Convolutional neural network model -- vgg-16 network structure and code implementation

Convolutional neural network model -- vgg-16 network structure and code implementation

2022-07-25 13:08:00 【1 + 1= Wang】

List of articles

VGGNet brief introduction
VGG16 Network structure
Use pytorch build VGG16

VGGNet brief introduction

VGG original text ：Very deep convolutional networks for large-scale image recognition：https://arxiv.org/pdf/1409.1556.pdf

Insert picture description here
VGG stay 2014 Oxford University Visual GeometryGroup Put forward , Get the year lmageNet In the race Localization Task( Positioning tasks ) First place and Classification Task ( Classification task ) proxime accessit .

VGG And AlexNet comparison , It uses several consecutive 3x3 Convolution kernel substitution of AlexNet The larger convolution kernel in .

stay VGG in , Used 3 individual 3x3 Convolution kernel to replace 7x7 Convolution kernel , Used 2 individual 3x3 Convolution kernel to replace 5*5 Convolution kernel , Thus, under the condition of ensuring the same perceptual field , Improved the depth of the network , To some extent, it improves the effect of neural network .
Insert picture description here
In the paper , The author tried to use 5 Different network structures , The depths are 11,11,13,16,19,5 The structure diagram is shown below ：

The most common one is VGG16 and VGG19, So let's do that VGG16 As an example to analyze its network structure .

VGG16 Network structure

VGG16 Medium 16 It means that it is made by 16 layers （13 Convolution layers + 3 All connection layers , Pool layer is not included ）.

VGG The input image size of is 224X224X3 Three channel color image , share 1000 Categories .
Insert picture description here
The convolution kernel size of convolution layer is 3,padding by 1; Of pool layer kernel_size by 2,stride by 2.

therefore

The convolution layer only changes the number of channels of the characteristic graph , Don't change the size .（W - 3 + 2*1）/ 1 + 1 = W
The pooling layer does not change the number of channels of the characteristic graph , The size becomes half of the original .

VGG It has obvious block structure ,VGG It can be divided into the following six pieces ：

Two convolutions + A pool ：conv3-64+conv3-64 + maxpool
Two convolutions + A pool ：conv3-128+conv3-128+ maxpool
Three convolutions + A pool ：conv3-256+conv3-256+conv3-256+ maxpool
Three convolutions + A pool ：conv3-512+conv3-512+conv3-512+ maxpool
Three convolutions + A pool ：conv3-512+conv3-512+conv3-512+ maxpool
Three full connections ：fc-4096 + fc-4096 + fc-1000( Corresponding 1000 Categories )

Use pytorch build VGG16

For the sake of understanding , We divide the forward propagation process into two parts ,

One is the feature extraction layer （features）, Include 13 Convolution layers ;
The other is the classification layer （classify）, Include 3 All connection layers .

features

def make_features(self):
    cfgs = [64, 64, 'MaxPool', 128, 128, 'MaxPool', 256, 256, 256, 'MaxPool', 512, 512, 512, 'MaxPool', 512, 512, 512, 'MaxPool']

    layers = []
    in_channel = 3

    for cfg in cfgs:
        if cfg == "MaxPool":    #  Pooling layer 
            layers += [nn.MaxPool2d(kernel_size=2,stride=2)]
        else:
            layers += [nn.Conv2d(in_channels=in_channel,out_channels=cfg,kernel_size=3,padding=1)]
            layers += [nn.ReLU(True)]
            in_channel = cfg
    return nn.Sequential(*layers)

classifier

【 Be careful 】： Before making a full connection , It is necessary to flatten the three-dimensional feature image output from the convolution layer to 1 dimension .

x = torch.flatten(x,start_dim=1)

self.classifier = nn.Sequential(
    nn.Linear(512 * 7 * 7, 4096),
    nn.ReLU(True),
    nn.Dropout(p=0.5),
    nn.Linear(4096, 4096),
    nn.ReLU(True),
    nn.Dropout(p=0.5),
    nn.Linear(4096, 1000)
)

Complete code

""" #-*-coding:utf-8-*- # @author: wangyu a beginner programmer, striving to be the strongest. # @date: 2022/7/1 15:01 """
import torch
import torch.nn as nn

class VGG(nn.Module):
    def __init__(self):
        super(VGG, self).__init__()

        self.features = self.make_features()

        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(True),
            nn.Dropout(p=0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(p=0.5),
            nn.Linear(4096, 1000)
        )

    def forward(self,x):
        x = self.features(x)
        x = torch.flatten(x,start_dim=1)
        x = self.classifier(x)

        return x


    def make_features(self):
        cfgs = [64, 64, 'MaxPool', 128, 128, 'MaxPool', 256, 256, 256, 'MaxPool', 512, 512, 512, 'MaxPool', 512, 512, 512, 'MaxPool']

        layers = []
        in_channel = 3

        for cfg in cfgs:
            if cfg == "MaxPool":    #  Pooling layer 
                layers += [nn.MaxPool2d(kernel_size=2,stride=2)]
            else:
                layers += [nn.Conv2d(in_channels=in_channel,out_channels=cfg,kernel_size=3,padding=1)]
                layers += [nn.ReLU(True)]
                in_channel = cfg
        return nn.Sequential(*layers)

net = VGG()
print(net)