当前位置：网站首页>[pattern recognition]

[pattern recognition]

2022-06-30 11:46:00 【2345VOR】

1. subject ： Based on improvement LetNet5 and VIT neural network cifar10 Research on identification method

2. Job content requirements ：

2.1. Briefly describe the pattern recognition system

A complete pattern recognition system is basically composed of three parts , Data acquisition 、 Data processing and classification decision or model matching . When designing a pattern recognition system , Note the definition of the schema class 、 applications 、 Pattern representation 、 Feature extraction and selection 、 Clustering analysis 、 Classifier design and learning 、 Selection of training and test samples 、 Performance evaluation, etc . For different application purposes , The contents of the three parts of the pattern recognition system can be very different , Especially in data processing and pattern classification , In order to improve the reliability of recognition results, it is often necessary to add knowledge base （ The rules ） To correct possible errors , Or the search space of the recognition pattern in the model base can be greatly reduced by introducing restrictions , To reduce the amount of matching calculation .

2.2. Introduce the convolution and VIT The basic principles of neural networks

Convolutional neural networks （Convolutional Neural Network, abbreviation CNN）,
It's a feedforward neural network , Artificial neurons can respond to the surrounding cells , Large scale image processing . Typical CNN from 3 Parts make up ： Convolution layer 、 Pooling layer 、 Fully connected layer . If described simply ： Convolution layer is responsible for extracting the local features in the image ; The pool layer is used to greatly reduce the parameter magnitude ( Dimension reduction ); The whole connection layer is similar to the part of traditional neural network , Used to output the desired result . link
Vision Transformer (ViT)
In the field of computer vision , Most algorithms keep CNN The overall structure remains the same , stay CNN add attention Module or use attention Module replacement CNN Some parts of . Some researchers suggest that , Why always depend on CNN Well ？ therefore , The author puts forward ViT Algorithm , Only use Transformer The structure can also perform well in image classification tasks . suffer NLP In the field Transformer Inspiration for successful application ,ViT The algorithm attempts to integrate the standard Transformer The structure is applied directly to the image , And make the least modification to the whole image classification process . In particular ,ViT In the algorithm, , The whole image will be split into small image blocks , Then the linear embedding sequence of these small image blocks is used as Transformer Input into the network , Then supervised learning is used to train image classification . link

3. Description of the operating environment of the research experiment

Experimental environment
The computer adopts ASUS' Flying Fortress notebook GTX 1650 4G,i5-9300H 2.4GHz
Neural network adoption pytorch + improvement LetNet5 and VIT+cifar10

4. LetNet5 Code design and description

Letnet5 The network structure is as follows ：

Specific parameters ：

improvement Letnet5 The parameters are as follows ：

improvement LetNet+cifar10 Use the tutorial

4.1. Get ready

4.1.1 download cifar10 Data sets

https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz

Decompress and put it in date Unzip the folder
Insert picture description here

newly build cifar_img, Put in the picture you need to predict , The size is 33232
Insert picture description here

4.1.2 Install dependent Libraries

pip install once

torch PIL tensorboardX torchvision imageio pickle

4.2. Debug network

4.2.1 function net.py

net.py

#  Building neural networks 
import torch
from torch import nn
from torchsummary import summary
class cifar_model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.model1 = nn.Sequential(
            nn.Conv2d(3, 32, 5, padding=2),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 32, 5, padding=2),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 64, 5, padding=2),
            nn.MaxPool2d(2),
            nn.Flatten(),
            nn.Linear(1024, 500),
            nn.Linear(500, 64),
            nn.Linear(64, 10)
        )
    def forward(self, x):
        x = self.model1(x)
        return x

if __name__ == '__main__':
    test_cifar = cifar_model()
    input = torch.ones((64, 3, 32, 32))
    output = test_cifar(input)
    print(output.shape)  # torch.Size([64, 10]) #  Check the correctness of the model 
    device = torch.device('cuda:0')
    test_cifar.to(device)
    summary(test_cifar, (3, 32, 32))#  Print network structure

Return to the network structure

torch.Size([64, 10])
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 32, 32, 32]           2,432
         MaxPool2d-2           [-1, 32, 16, 16]               0
            Conv2d-3           [-1, 32, 16, 16]          25,632
         MaxPool2d-4             [-1, 32, 8, 8]               0
            Conv2d-5             [-1, 64, 8, 8]          51,264
         MaxPool2d-6             [-1, 64, 4, 4]               0
           Flatten-7                 [-1, 1024]               0
            Linear-8                  [-1, 500]         512,500
            Linear-9                   [-1, 64]          32,064
           Linear-10                   [-1, 10]             650
================================================================
Total params: 624,542
Trainable params: 624,542
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.01
Forward/backward pass size (MB): 0.44
Params size (MB): 2.38
Estimated Total Size (MB): 2.84
----------------------------------------------------------------

Process finished with exit code 0

4.2.2 function train.py

rain.py

import time
import torch
import torchvision
from tensorboardX import SummaryWriter
from torch import nn
from torch.utils.data import DataLoader
from net import cifar_model
import os
#  Define the training equipment 
# device = torch.device("cuda")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

#  Prepare the dataset 
train_data = torchvision.datasets.CIFAR10("./data", train=True, transform=torchvision.transforms.ToTensor(), download=False)
test_data = torchvision.datasets.CIFAR10("./data", train=True, transform=torchvision.transforms.ToTensor(), download=False)

# length  length 
train_data_size = len(train_data)  # 50000
test_data_size = len(test_data)  # 10000
print(" The length of the training data set is :{}".format(train_data_size))
print(" The length of the test data set is :{}".format(test_data_size))

#  utilize Dataloader To load the dataset 
train_dataloader = DataLoader(train_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)
#  Load the model and use GPU
cifarr = cifar_model()
cifarr = cifarr.to(device)

#  Loss function 
loss_fn = nn.CrossEntropyLoss()
loss_fn = loss_fn.to(device)

#  Optimizer 
learning_rate = 1e-2
optimizer = torch.optim.SGD(cifarr.parameters(), lr=learning_rate)

#  Set some parameters of the training network 
#  Record the number of workouts 
total_train_step = 0
#  Record the number of tests 
total_test_step = 0
#  Number of training rounds 
epoch = 30

#  add to tensorboard
folder = 'cifar_log'
if not os.path.exists(folder):
    os.mkdir('cifar_log')
writer = SummaryWriter("./cifar_log")
start_time = time.time()
min_acc = 0
for i in range(epoch):
    print("———————— The first {} Round of training begins ————————".format(i+1))

    #  The training steps begin 
    cifarr.train()
    for data in train_dataloader:
        imgs, targets = data
        imgs = imgs.to(device)
        targets = targets.to(device)
        outputs = cifarr(imgs)
        loss = loss_fn(outputs, targets)

        #  Optimizer optimization model 
        optimizer.zero_grad()  #  Optimizer gradient zeroing 
        loss.backward()  #  Back propagation , Calculate the gradient 
        optimizer.step()  #  According to the gradient and optimizer , Update weights 

        total_train_step = total_train_step + 1
        if total_train_step % 100 == 0:
            end_time = time.time()
            print(end_time-start_time)
            print(" Training times :{}, Loss:{}".format(total_train_step, loss.item()))
            writer.add_scalar("train_loss", loss.item(), total_train_step)

    #  Verification steps 
    cifarr.eval()
    total_test_loss = 0
    total_accuracy = 0  #  Overall accuracy 
    with torch.no_grad():  #  The content after forcing is not calculated and the graph is not built 
        for data in test_dataloader:
            imgs, targets = data
            imgs = imgs.to(device)
            targets = targets.to(device)
            outputs = cifarr(imgs)
            loss = loss_fn(outputs, targets)
            total_test_loss = total_test_loss + loss.item()
            accuracy = (outputs.argmax(1) == targets).sum()
            total_accuracy = total_accuracy + accuracy
    print(" On the overall test set loss:{}".format(total_test_loss))
    print(" Accuracy on the overall test set :{}".format(total_accuracy/test_data_size))
    writer.add_scalar("test_loss", total_test_loss, total_test_step)
    writer.add_scalar("test_accuracy", total_accuracy/test_data_size, total_test_step)
    total_test_step = total_test_step + 1
    a = total_accuracy/test_data_size
    #  Save the best model weight file 
    if a > min_acc:
        folder = 'cifar_weight'
        if not os.path.exists(folder):
            os.mkdir('cifar_weight')
        min_acc = a
        print('save best model', )
        torch.save(cifarr.state_dict(), "cifar_weight/best_model.pth")
    #  Save the last weight file 
    if i == epoch - 1:
        torch.save(cifarr.state_dict(), "cifar_weight/last_model.pth")

writer.close()

Import related libraries — Load training set and verification set — Use tensorboard Log — Iterative training — Save weights and logs
Training 30 round , About need 15 minute , The accuracy is 76.01%, ok ！ The thirtieth round is printed as follows

———————— The first 30 Round of training begins ————————
545.7393746376038
 Training times :22700, Loss:0.5816277861595154
546.7848184108734
 Training times :22800, Loss:0.6448397040367126
547.836345911026
 Training times :22900, Loss:0.7739525437355042
548.8866136074066
 Training times :23000, Loss:0.48977798223495483
549.9396405220032
 Training times :23100, Loss:0.5046865940093994
550.9778225421906
 Training times :23200, Loss:0.7065502405166626
552.0327830314636
 Training times :23300, Loss:0.6883654594421387
553.0864734649658
 Training times :23400, Loss:0.54579758644104
 On the overall test set loss:541.3202195465565
 Accuracy on the overall test set :0.7601000070571899
save best model

Process finished with exit code 0

Open the terminal , stay tensorboard See the training results in

tensorboard --logdir “cifar_log”

Visual training losses 、 Test accuracy 、 Test loss

Insert picture description here

5. VIT+cifar10 Code design and description

This project uses ViT Yes DATA Set CIFAR10 Perform image classification tasks .Vit And the realization of pre training weight comes from https://github.com/asyml/vision-transformer-pytorch.

5.1. install

pip install once

torch tqdm scheduler matplotlib argparse

5.2. Data sets

from http://www.cs.toronto.edu/~kriz/cifar.html download CIFAR10 Or from https://pan.baidu.com/s/1ogAFopdVzswge2Aaru_lvw obtain （ Code ：k5v8）, Create a data file and extract ‘./data’ Under the cifar-10-python.tar.gz. Same as before

5.3. Pre training model

You can learn from https://pan.baidu.com/s/1CuUj-XIXwecxWMEcLoJzPg（ Code ：ox9n） Download pre training files , stay ./Vit_weights Create Vit_weights floder And pre training files , The method based on Transfer Learning
function main.py ,bach_size Set to 4, Only once , Probably need 1h15min.
Insert picture description here
main.py

# -*- coding: utf-8 -*-
# @File : main.py
# @Author : Kaicheng Yang
# @Time : 2022/01/26 11:03:50
import argparse
from torchvision import datasets, transforms
import torch
from torchvision.transforms import Resize, ToTensor, Normalize
from PIL import Image
from train import train_model


def main():
    parser = argparse.ArgumentParser()
    # Optimizer parameters
    parser.add_argument("--learning_rate", default = 2e-5, type = float,
                        help = "The initial learning rate for Adam.5e-5")
    parser.add_argument('--opt-eps', default = None, type = float, metavar = 'EPSILON',
                        help = 'Optimizer Epsilon (default: None, use opt default)')
    parser.add_argument("--beta1", type = float, default = 0.99, help = "Adam beta 1.")
    parser.add_argument("--beta2", type = float, default = 0.99, help = "Adam beta 2.")
    parser.add_argument("--eps", type = float, default = 1e-6, help = "Adam epsilon.")
    parser.add_argument('--momentum', type = float, default = 0.9, metavar = 'M',
                        help = 'Optimizer momentum (default: 0.9)')
    parser.add_argument('--weight_decay', type = float, default = 2e-5,
                        help = 'weight decay (default: 2e-5)')
    parser.add_argument(
        "--warmup", type = int, default = 500, help = "Number of steps to warmup for."
    )
    parser.add_argument("--batch_size", type = int, default = 4, help = "Number of steps to warmup for.")
    parser.add_argument("--epoches", type = int, default = 1, help = "Number of steps to warmup for.")
    #Vit params
    parser.add_argument("--output", default = './output', type = str)
    parser.add_argument("--vit_model", default = './Vit_weights/imagenet21k+imagenet2012_ViT-B_16-224.pth', type = str)
    parser.add_argument("--image_size", type = int, default = 224, help = "input image size", choices = [224, 384])
    parser.add_argument("--num-classes", type = int, default = 10, help = "number of classes in dataset")
    parser.add_argument("--patch_size", type = int, default = 16)
    parser.add_argument("--emb_dim", type = int, default = 768)
    parser.add_argument("--mlp_dim", type = int, default = 3072)
    parser.add_argument("--num_heads", type = int, default = 12)
    parser.add_argument("--num_layers", type = int, default = 12)
    parser.add_argument("--attn_dropout_rate", type = float, default = 0.0)
    parser.add_argument("--dropout_rate", type = float, default = 0.1)

    args = parser.parse_args()
    normalize = Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    transform = transforms.Compose([
            Resize((224, 224)),
            ToTensor(),
            normalize,
        ])

    trainset = datasets.CIFAR10(root = './data', train = True,
                                        download = True, transform = transform)
                                        
    testset = datasets.CIFAR10(root = './data', train = True,
                                        download = True, transform = transform)
                                        
    trainloader = torch.utils.data.DataLoader(trainset, batch_size = args.batch_size,
                                            shuffle = True, num_workers = 2)
                                            
    testloader = torch.utils.data.DataLoader(testset, batch_size = args.batch_size,
                                            shuffle = False, num_workers = 2)
    
    train_model(args, trainloader, testloader)



if __name__ == '__main__':
    main()

model.py

# -*- coding: utf-8 -*-
# @File : model.py
# @Author : Kaicheng Yang
# @Time : 2022/01/26 11:03:24
import torch
import torch.nn as nn
import torch.nn.functional as F

class PositionEmbs(nn.Module):
    def __init__(self, num_patches, emb_dim, dropout_rate = 0.1):
        super(PositionEmbs, self).__init__()
        self.pos_embedding = nn.Parameter(torch.randn(1, num_patches + 1, emb_dim))
        if dropout_rate > 0:
            self.dropout = nn.Dropout(dropout_rate)
        else:
            self.dropout = None

    def forward(self, x):
        out = x + self.pos_embedding

        if self.dropout:
            out = self.dropout(out)

        return out

class MlpBlock(nn.Module):
    """ Transformer Feed-Forward Block """
    def __init__(self, in_dim, mlp_dim, out_dim, dropout_rate = 0.1):
        super(MlpBlock, self).__init__()
        # init layers
        self.fc1 = nn.Linear(in_dim, mlp_dim)
        self.fc2 = nn.Linear(mlp_dim, out_dim)
        self.act = nn.GELU()
        if dropout_rate > 0.0:
            self.dropout1 = nn.Dropout(dropout_rate)
            self.dropout2 = nn.Dropout(dropout_rate)
        else:
            self.dropout1 = None
            self.dropout2 = None

    def forward(self, x):
        out = self.fc1(x)
        out = self.act(out)
        if self.dropout1:
            out = self.dropout1(out)
        out = self.fc2(out)
        out = self.dropout2(out)
        return out

class LinearGeneral(nn.Module):
    def __init__(self, in_dim = (768,), feat_dim = (12, 64)):
        super(LinearGeneral, self).__init__()

        self.weight = nn.Parameter(torch.randn(*in_dim, *feat_dim))
        self.bias = nn.Parameter(torch.zeros(*feat_dim))

    def forward(self, x, dims):
        a = torch.tensordot(x, self.weight, dims = dims) + self.bias
        return a


class SelfAttention(nn.Module):
    def __init__(self, in_dim, heads = 8, dropout_rate = 0.1):
        super(SelfAttention, self).__init__()
        self.heads = heads
        self.head_dim = in_dim // heads
        self.scale = self.head_dim ** 0.5

        self.query = LinearGeneral((in_dim,), (self.heads, self.head_dim))
        self.key = LinearGeneral((in_dim,), (self.heads, self.head_dim))
        self.value = LinearGeneral((in_dim,), (self.heads, self.head_dim))
        self.out = LinearGeneral((self.heads, self.head_dim), (in_dim,))

        if dropout_rate > 0:
            self.dropout = nn.Dropout(dropout_rate)
        else:
            self.dropout = None

    def forward(self, x):
        b, n, _ = x.shape

        q = self.query(x, dims = ([2], [0]))
        k = self.key(x, dims = ([2], [0]))
        v = self.value(x, dims = ([2], [0]))

        q = q.permute(0, 2, 1, 3)
        k = k.permute(0, 2, 1, 3)
        v = v.permute(0, 2, 1, 3)

        attn_weights = torch.matmul(q, k.transpose(-2, -1)) / self.scale
        attn_weights = F.softmax(attn_weights, dim = -1)
        out = torch.matmul(attn_weights, v)
        out = out.permute(0, 2, 1, 3)
        out = self.out(out, dims = ([2, 3], [0, 1]))
        return out


class EncoderBlock(nn.Module):
    def __init__(self, in_dim, mlp_dim, num_heads, dropout_rate = 0.1, attn_dropout_rate = 0.1):
        super(EncoderBlock, self).__init__()

        self.norm1 = nn.LayerNorm(in_dim)
        self.attn = SelfAttention(in_dim, heads = num_heads, dropout_rate = attn_dropout_rate)
        if dropout_rate > 0:
            self.dropout = nn.Dropout(dropout_rate)
        else:
            self.dropout = None
        self.norm2 = nn.LayerNorm(in_dim)
        self.mlp = MlpBlock(in_dim, mlp_dim, in_dim, dropout_rate)

    def forward(self, x):
        residual = x
        out = self.norm1(x)
        out = self.attn(out)
        if self.dropout:
            out = self.dropout(out)
        out += residual
        residual = out
        out = self.norm2(out)
        out = self.mlp(out)
        out += residual
        return out


class Encoder(nn.Module):
    def __init__(self, num_patches, emb_dim, mlp_dim, num_layers = 12, num_heads = 12, dropout_rate = 0.1, attn_dropout_rate = 0.0):
        super(Encoder, self).__init__()
        # positional embedding
        self.pos_embedding = PositionEmbs(num_patches, emb_dim, dropout_rate)
        # encoder blocks
        in_dim = emb_dim
        self.encoder_layers = nn.ModuleList()
        for _ in range(num_layers):
            layer = EncoderBlock(in_dim, mlp_dim, num_heads, dropout_rate, attn_dropout_rate)
            self.encoder_layers.append(layer)
        self.norm = nn.LayerNorm(in_dim)

    def forward(self, x):
        out = self.pos_embedding(x)
        for layer in self.encoder_layers:
            out = layer(out)
        out = self.norm(out)
        return out


class VisionTransformer(nn.Module):
    """ Vision Transformer """
    def __init__(self,
                 image_size = (256, 256),
                 patch_size = (16, 16),
                 emb_dim = 768,
                 mlp_dim = 3072,
                 num_heads = 12,
                 num_layers = 12,
                 attn_dropout_rate = 0.0,
                 dropout_rate = 0.1):
        super(VisionTransformer, self).__init__()
        h, w = image_size

        # embedding layer
        fh, fw = patch_size
        gh, gw = h // fh, w // fw
        num_patches = gh * gw
        self.embedding = nn.Conv2d(3, emb_dim, kernel_size = (fh, fw), stride = (fh, fw))
        # class token
        self.cls_token = nn.Parameter(torch.zeros(1, 1, emb_dim))
        # transformer
        self.transformer = Encoder(
            num_patches = num_patches,
            emb_dim = emb_dim,
            mlp_dim = mlp_dim,
            num_layers = num_layers,
            num_heads = num_heads,
            dropout_rate = dropout_rate,
            attn_dropout_rate = attn_dropout_rate)

    def forward(self, x):
        emb = self.embedding(x)     # (n, c, gh, gw)
        emb = emb.permute(0, 2, 3, 1)  # (n, gh, hw, c)
        b, h, w, c = emb.shape
        emb = emb.reshape(b, h * w, c)
        # prepend class token
        cls_token = self.cls_token.repeat(b, 1, 1)
        emb = torch.cat([cls_token, emb], dim = 1)
        # transformer
        feat = self.transformer(emb)
        return feat

class CAFIA_Transformer(nn.Module):
    def __init__(self, args):
        super(CAFIA_Transformer, self).__init__()
        self.vit = VisionTransformer(
            image_size = (args.image_size, args.image_size),
            patch_size = (args.patch_size, args.patch_size),
            emb_dim = args.emb_dim,
            mlp_dim = args.mlp_dim,
            num_heads = args.num_heads,
            num_layers = args.num_layers,
            attn_dropout_rate = args.attn_dropout_rate,
            dropout_rate = args.dropout_rate)
        self.init_weight(args)
        self.classifier = nn.Linear(args.emb_dim, args.num_classes)
    

    def init_weight(self, args):
        state_dict = torch.load(args.vit_model)['state_dict']
        del state_dict['classifier.weight']
        del state_dict['classifier.bias']
        self.vit.load_state_dict(state_dict)

    def forward(self, batch_X):
        feat = self.vit(batch_X)
        output = self.classifier(feat[:, 0])
        return output

predict1.py

import os
import json
import argparse
import torch
from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt
from model import CAFIA_Transformer
#  Get forecast results 
# classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
def predict():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

    data_transform = transforms.Compose(
        [transforms.Resize(256),
         transforms.CenterCrop(224),
         transforms.ToTensor(),
         transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])])

    # load image
    img_path = "./image/bird.png"
    assert os.path.exists(img_path), "file: '{}' dose not exist.".format(img_path)
    img = Image.open(img_path).convert('RGB')
    #img = Image.open(img_path)
    plt.imshow(img)
    # [N, C, H, W]
    img = data_transform(img)
    # expand batch dimension
    img = torch.unsqueeze(img, dim=0)

    # read class_indict
    json_path = './class_indices.json'
    assert os.path.exists(json_path), "file: '{}' dose not exist.".format(json_path)

    json_file = open(json_path, "r")
    class_indict = json.load(json_file)

    # create model
    parser = argparse.ArgumentParser()
    parser.add_argument("--output", default='./output', type=str)
    parser.add_argument("--vit_model", default='./Vit_weights/imagenet21k+imagenet2012_ViT-B_16-224.pth', type=str)
    parser.add_argument("--image_size", type=int, default=224, help="input image size", choices=[224, 384])
    parser.add_argument("--num-classes", type=int, default=10, help="number of classes in dataset")
    parser.add_argument("--patch_size", type=int, default=16)
    parser.add_argument("--emb_dim", type=int, default=768)
    parser.add_argument("--mlp_dim", type=int, default=3072)
    parser.add_argument("--num_heads", type=int, default=12)
    parser.add_argument("--num_layers", type=int, default=12)
    parser.add_argument("--attn_dropout_rate", type=float, default=0.0)
    parser.add_argument("--dropout_rate", type=float, default=0.1)

    args = parser.parse_args()


    model = CAFIA_Transformer(args)
    model.to(device)  #has_logits=False And training （train.py） The settings are the same 
    # load model weights
    model_weight_path = "./output/0.9912199974060059.pt"
    model.load_state_dict(torch.load(model_weight_path, map_location=device))
    model.eval()
    with torch.no_grad():
        # predict class
        output = torch.squeeze(model(img.to(device))).cpu()
        predict = torch.softmax(output, dim=0)
        predict_cla = torch.argmax(predict).numpy()
    if predict[predict_cla].numpy() <0.5:
        print_res = "class: {} prob: {:.3}".format('no match',
                                                     predict[predict_cla].numpy())
    else:
        print_res = "class: {} prob: {:.3}".format(class_indict[str(predict_cla)],
                                                 predict[predict_cla].numpy())
    plt.title(print_res)
    print(print_res)
    plt.show()


if __name__ == '__main__':
    predict()

5.4. result

Weights based on pre training , Through a epoch, We get 99.1 %Accuracy
Insert picture description here

6. Run a screenshot

6.1 improvement LetNet

improvement LetNet Run the main program main.py, Return as follows , Predictive dog dog, The prediction rate is 77.8%

torch.Size([3, 32, 32])
class: plane        prob: 0.000458
class: car          prob: 1.94e-08
class: bird         prob: 0.124
class: cat          prob: 0.0532
class: deer         prob: 0.0414
class: dog          prob: 0.778
class: frog         prob: 5.11e-05
class: horse        prob: 0.0019
class: ship         prob: 0.000887
class: truck        prob: 1.69e-07

Process finished with exit code 0

effect
Insert picture description here

6.2 VIT+cifar10

First add a few image png picture
Insert picture description here
Please add a picture description

function predict1.py, At this time, the default prediction picture is 502.png, Return mismatch

class: no match   prob: 0.222

Insert picture description here

Change the forecast picture to bird.png, The return prediction probability is 99.6%, It's already very high ！

class: bird   prob: 0.996

Insert picture description here

7. Comparison and analysis of experimental results

（ The results of the two methods are compared , precision 、 Iterations, etc ）

neural network	precision	Iteration time	Bach_size	The number of iterations
improvement LetNet5	76.01%	15min	64	30
VIT	99.12%	1h15min	4	1

8. Experience and summary

First of all, I would like to thank Mr. Cheng for giving us the pattern recognition course , Let's have a systematic understanding of pattern recognition , From the beginning of the mean and variance into the search for good inducement , Let's have a deeper understanding of patterns and recognition , At the same time, I was very moved along the way. Many groups shared wonderful network structure and experience , It is expected that neural network will promote pattern recognition ！
The two kinds of neural networks in this experiment are from the network , Because you have to adapt your computer , You need to adjust the relevant parameters , There will inevitably be many bug, But it's just a little bit like this , Will also reap a lot , for example tensorboard Display of , According to the training weight, the prediction output is displayed, etc .
From the above experiments we can see that , although Transformer Architecture has become the de facto standard for naturallanguageprocessing tasks , But its application in computer vision is still limited . In vision , Attention is either used in conjunction with convolutional networks , Or it can be used to replace some components of the convolution network , While maintaining its overall structure . We show that , This pair CNN Dependency is unnecessary , The pure converter directly applied to the image block sequence can perform the image classification task well . When a large amount of data is pre trained and transmitted to multiple small and medium-sized image recognition benchmarks （ImageNet、CIFAR-10、VTAB etc. ） when , Compared with the most advanced convolutional network , Vision Converter （ViT） Excellent results can be obtained , At the same time, the computing resources required are greatly reduced .

9. experimental data

CIFAR-10, Contains 6 The resolution of 10000 pieces is 32x32 Pictures of the , It is divided into 10 class , The plane 、 automobile 、 birds 、 cat 、 deer 、 Dog 、 Frogs 、 Horse 、 Boats and trucks . There are... In the data set 50000 Training pictures and 10000 Test pictures .
Insert picture description here