当前位置:网站首页>[pattern recognition]
[pattern recognition]
2022-06-30 11:46:00 【2345VOR】
Pattern recognition big job
- 1. subject : Based on improvement LetNet5 and VIT neural network cifar10 Research on identification method
- 2. Job content requirements :
- 3. Description of the operating environment of the research experiment
- 4. LetNet5 Code design and description
- 5. VIT+cifar10 Code design and description
- 6. Run a screenshot
- 7. Comparison and analysis of experimental results
- 8. Experience and summary
- 9. experimental data
1. subject : Based on improvement LetNet5 and VIT neural network cifar10 Research on identification method
2. Job content requirements :
2.1. Briefly describe the pattern recognition system
A complete pattern recognition system is basically composed of three parts , Data acquisition 、 Data processing and classification decision or model matching . When designing a pattern recognition system , Note the definition of the schema class 、 applications 、 Pattern representation 、 Feature extraction and selection 、 Clustering analysis 、 Classifier design and learning 、 Selection of training and test samples 、 Performance evaluation, etc . For different application purposes , The contents of the three parts of the pattern recognition system can be very different , Especially in data processing and pattern classification , In order to improve the reliability of recognition results, it is often necessary to add knowledge base ( The rules ) To correct possible errors , Or the search space of the recognition pattern in the model base can be greatly reduced by introducing restrictions , To reduce the amount of matching calculation .
2.2. Introduce the convolution and VIT The basic principles of neural networks
Convolutional neural networks (Convolutional Neural Network, abbreviation CNN),
It's a feedforward neural network , Artificial neurons can respond to the surrounding cells , Large scale image processing . Typical CNN from 3 Parts make up : Convolution layer 、 Pooling layer 、 Fully connected layer . If described simply : Convolution layer is responsible for extracting the local features in the image ; The pool layer is used to greatly reduce the parameter magnitude ( Dimension reduction ); The whole connection layer is similar to the part of traditional neural network , Used to output the desired result . linkVision Transformer (ViT)
In the field of computer vision , Most algorithms keep CNN The overall structure remains the same , stay CNN add attention Module or use attention Module replacement CNN Some parts of . Some researchers suggest that , Why always depend on CNN Well ? therefore , The author puts forward ViT Algorithm , Only use Transformer The structure can also perform well in image classification tasks . suffer NLP In the field Transformer Inspiration for successful application ,ViT The algorithm attempts to integrate the standard Transformer The structure is applied directly to the image , And make the least modification to the whole image classification process . In particular ,ViT In the algorithm, , The whole image will be split into small image blocks , Then the linear embedding sequence of these small image blocks is used as Transformer Input into the network , Then supervised learning is used to train image classification . link
3. Description of the operating environment of the research experiment
Experimental environment
The computer adopts ASUS' Flying Fortress notebook GTX 1650 4G,i5-9300H 2.4GHz
Neural network adoption pytorch + improvement LetNet5 and VIT+cifar10
4. LetNet5 Code design and description
Letnet5 The network structure is as follows :
Specific parameters :
improvement Letnet5 The parameters are as follows :
improvement LetNet+cifar10 Use the tutorial
4.1. Get ready
4.1.1 download cifar10 Data sets
https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
Decompress and put it in date Unzip the folder 
newly build cifar_img, Put in the picture you need to predict , The size is 33232

4.1.2 Install dependent Libraries
pip install once
torch PIL tensorboardX torchvision imageio pickle
4.2. Debug network
4.2.1 function net.py
net.py
# Building neural networks
import torch
from torch import nn
from torchsummary import summary
class cifar_model(nn.Module):
def __init__(self) -> None:
super().__init__()
self.model1 = nn.Sequential(
nn.Conv2d(3, 32, 5, padding=2),
nn.MaxPool2d(2),
nn.Conv2d(32, 32, 5, padding=2),
nn.MaxPool2d(2),
nn.Conv2d(32, 64, 5, padding=2),
nn.MaxPool2d(2),
nn.Flatten(),
nn.Linear(1024, 500),
nn.Linear(500, 64),
nn.Linear(64, 10)
)
def forward(self, x):
x = self.model1(x)
return x
if __name__ == '__main__':
test_cifar = cifar_model()
input = torch.ones((64, 3, 32, 32))
output = test_cifar(input)
print(output.shape) # torch.Size([64, 10]) # Check the correctness of the model
device = torch.device('cuda:0')
test_cifar.to(device)
summary(test_cifar, (3, 32, 32))# Print network structure
Return to the network structure
torch.Size([64, 10])
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 32, 32, 32] 2,432
MaxPool2d-2 [-1, 32, 16, 16] 0
Conv2d-3 [-1, 32, 16, 16] 25,632
MaxPool2d-4 [-1, 32, 8, 8] 0
Conv2d-5 [-1, 64, 8, 8] 51,264
MaxPool2d-6 [-1, 64, 4, 4] 0
Flatten-7 [-1, 1024] 0
Linear-8 [-1, 500] 512,500
Linear-9 [-1, 64] 32,064
Linear-10 [-1, 10] 650
================================================================
Total params: 624,542
Trainable params: 624,542
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.01
Forward/backward pass size (MB): 0.44
Params size (MB): 2.38
Estimated Total Size (MB): 2.84
----------------------------------------------------------------
Process finished with exit code 0
4.2.2 function train.py
rain.py
import time
import torch
import torchvision
from tensorboardX import SummaryWriter
from torch import nn
from torch.utils.data import DataLoader
from net import cifar_model
import os
# Define the training equipment
# device = torch.device("cuda")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Prepare the dataset
train_data = torchvision.datasets.CIFAR10("./data", train=True, transform=torchvision.transforms.ToTensor(), download=False)
test_data = torchvision.datasets.CIFAR10("./data", train=True, transform=torchvision.transforms.ToTensor(), download=False)
# length length
train_data_size = len(train_data) # 50000
test_data_size = len(test_data) # 10000
print(" The length of the training data set is :{}".format(train_data_size))
print(" The length of the test data set is :{}".format(test_data_size))
# utilize Dataloader To load the dataset
train_dataloader = DataLoader(train_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)
# Load the model and use GPU
cifarr = cifar_model()
cifarr = cifarr.to(device)
# Loss function
loss_fn = nn.CrossEntropyLoss()
loss_fn = loss_fn.to(device)
# Optimizer
learning_rate = 1e-2
optimizer = torch.optim.SGD(cifarr.parameters(), lr=learning_rate)
# Set some parameters of the training network
# Record the number of workouts
total_train_step = 0
# Record the number of tests
total_test_step = 0
# Number of training rounds
epoch = 30
# add to tensorboard
folder = 'cifar_log'
if not os.path.exists(folder):
os.mkdir('cifar_log')
writer = SummaryWriter("./cifar_log")
start_time = time.time()
min_acc = 0
for i in range(epoch):
print("———————— The first {} Round of training begins ————————".format(i+1))
# The training steps begin
cifarr.train()
for data in train_dataloader:
imgs, targets = data
imgs = imgs.to(device)
targets = targets.to(device)
outputs = cifarr(imgs)
loss = loss_fn(outputs, targets)
# Optimizer optimization model
optimizer.zero_grad() # Optimizer gradient zeroing
loss.backward() # Back propagation , Calculate the gradient
optimizer.step() # According to the gradient and optimizer , Update weights
total_train_step = total_train_step + 1
if total_train_step % 100 == 0:
end_time = time.time()
print(end_time-start_time)
print(" Training times :{}, Loss:{}".format(total_train_step, loss.item()))
writer.add_scalar("train_loss", loss.item(), total_train_step)
# Verification steps
cifarr.eval()
total_test_loss = 0
total_accuracy = 0 # Overall accuracy
with torch.no_grad(): # The content after forcing is not calculated and the graph is not built
for data in test_dataloader:
imgs, targets = data
imgs = imgs.to(device)
targets = targets.to(device)
outputs = cifarr(imgs)
loss = loss_fn(outputs, targets)
total_test_loss = total_test_loss + loss.item()
accuracy = (outputs.argmax(1) == targets).sum()
total_accuracy = total_accuracy + accuracy
print(" On the overall test set loss:{}".format(total_test_loss))
print(" Accuracy on the overall test set :{}".format(total_accuracy/test_data_size))
writer.add_scalar("test_loss", total_test_loss, total_test_step)
writer.add_scalar("test_accuracy", total_accuracy/test_data_size, total_test_step)
total_test_step = total_test_step + 1
a = total_accuracy/test_data_size
# Save the best model weight file
if a > min_acc:
folder = 'cifar_weight'
if not os.path.exists(folder):
os.mkdir('cifar_weight')
min_acc = a
print('save best model', )
torch.save(cifarr.state_dict(), "cifar_weight/best_model.pth")
# Save the last weight file
if i == epoch - 1:
torch.save(cifarr.state_dict(), "cifar_weight/last_model.pth")
writer.close()
Import related libraries — Load training set and verification set — Use tensorboard Log — Iterative training — Save weights and logs
Training 30 round , About need 15 minute , The accuracy is 76.01%, ok ! The thirtieth round is printed as follows
———————— The first 30 Round of training begins ————————
545.7393746376038
Training times :22700, Loss:0.5816277861595154
546.7848184108734
Training times :22800, Loss:0.6448397040367126
547.836345911026
Training times :22900, Loss:0.7739525437355042
548.8866136074066
Training times :23000, Loss:0.48977798223495483
549.9396405220032
Training times :23100, Loss:0.5046865940093994
550.9778225421906
Training times :23200, Loss:0.7065502405166626
552.0327830314636
Training times :23300, Loss:0.6883654594421387
553.0864734649658
Training times :23400, Loss:0.54579758644104
On the overall test set loss:541.3202195465565
Accuracy on the overall test set :0.7601000070571899
save best model
Process finished with exit code 0
Open the terminal , stay tensorboard See the training results in
tensorboard --logdir “cifar_log”
Visual training losses 、 Test accuracy 、 Test loss

5. VIT+cifar10 Code design and description
This project uses ViT Yes DATA Set CIFAR10 Perform image classification tasks .Vit And the realization of pre training weight comes from https://github.com/asyml/vision-transformer-pytorch.
5.1. install
pip install once
torch tqdm scheduler matplotlib argparse
5.2. Data sets
from http://www.cs.toronto.edu/~kriz/cifar.html download CIFAR10 Or from https://pan.baidu.com/s/1ogAFopdVzswge2Aaru_lvw obtain ( Code :k5v8), Create a data file and extract ‘./data’ Under the cifar-10-python.tar.gz. Same as before
5.3. Pre training model
You can learn from https://pan.baidu.com/s/1CuUj-XIXwecxWMEcLoJzPg( Code :ox9n) Download pre training files , stay ./Vit_weights Create Vit_weights floder And pre training files , The method based on Transfer Learning
function main.py ,bach_size Set to 4, Only once , Probably need 1h15min.
main.py
# -*- coding: utf-8 -*-
# @File : main.py
# @Author : Kaicheng Yang
# @Time : 2022/01/26 11:03:50
import argparse
from torchvision import datasets, transforms
import torch
from torchvision.transforms import Resize, ToTensor, Normalize
from PIL import Image
from train import train_model
def main():
parser = argparse.ArgumentParser()
# Optimizer parameters
parser.add_argument("--learning_rate", default = 2e-5, type = float,
help = "The initial learning rate for Adam.5e-5")
parser.add_argument('--opt-eps', default = None, type = float, metavar = 'EPSILON',
help = 'Optimizer Epsilon (default: None, use opt default)')
parser.add_argument("--beta1", type = float, default = 0.99, help = "Adam beta 1.")
parser.add_argument("--beta2", type = float, default = 0.99, help = "Adam beta 2.")
parser.add_argument("--eps", type = float, default = 1e-6, help = "Adam epsilon.")
parser.add_argument('--momentum', type = float, default = 0.9, metavar = 'M',
help = 'Optimizer momentum (default: 0.9)')
parser.add_argument('--weight_decay', type = float, default = 2e-5,
help = 'weight decay (default: 2e-5)')
parser.add_argument(
"--warmup", type = int, default = 500, help = "Number of steps to warmup for."
)
parser.add_argument("--batch_size", type = int, default = 4, help = "Number of steps to warmup for.")
parser.add_argument("--epoches", type = int, default = 1, help = "Number of steps to warmup for.")
#Vit params
parser.add_argument("--output", default = './output', type = str)
parser.add_argument("--vit_model", default = './Vit_weights/imagenet21k+imagenet2012_ViT-B_16-224.pth', type = str)
parser.add_argument("--image_size", type = int, default = 224, help = "input image size", choices = [224, 384])
parser.add_argument("--num-classes", type = int, default = 10, help = "number of classes in dataset")
parser.add_argument("--patch_size", type = int, default = 16)
parser.add_argument("--emb_dim", type = int, default = 768)
parser.add_argument("--mlp_dim", type = int, default = 3072)
parser.add_argument("--num_heads", type = int, default = 12)
parser.add_argument("--num_layers", type = int, default = 12)
parser.add_argument("--attn_dropout_rate", type = float, default = 0.0)
parser.add_argument("--dropout_rate", type = float, default = 0.1)
args = parser.parse_args()
normalize = Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
transform = transforms.Compose([
Resize((224, 224)),
ToTensor(),
normalize,
])
trainset = datasets.CIFAR10(root = './data', train = True,
download = True, transform = transform)
testset = datasets.CIFAR10(root = './data', train = True,
download = True, transform = transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size = args.batch_size,
shuffle = True, num_workers = 2)
testloader = torch.utils.data.DataLoader(testset, batch_size = args.batch_size,
shuffle = False, num_workers = 2)
train_model(args, trainloader, testloader)
if __name__ == '__main__':
main()
model.py
# -*- coding: utf-8 -*-
# @File : model.py
# @Author : Kaicheng Yang
# @Time : 2022/01/26 11:03:24
import torch
import torch.nn as nn
import torch.nn.functional as F
class PositionEmbs(nn.Module):
def __init__(self, num_patches, emb_dim, dropout_rate = 0.1):
super(PositionEmbs, self).__init__()
self.pos_embedding = nn.Parameter(torch.randn(1, num_patches + 1, emb_dim))
if dropout_rate > 0:
self.dropout = nn.Dropout(dropout_rate)
else:
self.dropout = None
def forward(self, x):
out = x + self.pos_embedding
if self.dropout:
out = self.dropout(out)
return out
class MlpBlock(nn.Module):
""" Transformer Feed-Forward Block """
def __init__(self, in_dim, mlp_dim, out_dim, dropout_rate = 0.1):
super(MlpBlock, self).__init__()
# init layers
self.fc1 = nn.Linear(in_dim, mlp_dim)
self.fc2 = nn.Linear(mlp_dim, out_dim)
self.act = nn.GELU()
if dropout_rate > 0.0:
self.dropout1 = nn.Dropout(dropout_rate)
self.dropout2 = nn.Dropout(dropout_rate)
else:
self.dropout1 = None
self.dropout2 = None
def forward(self, x):
out = self.fc1(x)
out = self.act(out)
if self.dropout1:
out = self.dropout1(out)
out = self.fc2(out)
out = self.dropout2(out)
return out
class LinearGeneral(nn.Module):
def __init__(self, in_dim = (768,), feat_dim = (12, 64)):
super(LinearGeneral, self).__init__()
self.weight = nn.Parameter(torch.randn(*in_dim, *feat_dim))
self.bias = nn.Parameter(torch.zeros(*feat_dim))
def forward(self, x, dims):
a = torch.tensordot(x, self.weight, dims = dims) + self.bias
return a
class SelfAttention(nn.Module):
def __init__(self, in_dim, heads = 8, dropout_rate = 0.1):
super(SelfAttention, self).__init__()
self.heads = heads
self.head_dim = in_dim // heads
self.scale = self.head_dim ** 0.5
self.query = LinearGeneral((in_dim,), (self.heads, self.head_dim))
self.key = LinearGeneral((in_dim,), (self.heads, self.head_dim))
self.value = LinearGeneral((in_dim,), (self.heads, self.head_dim))
self.out = LinearGeneral((self.heads, self.head_dim), (in_dim,))
if dropout_rate > 0:
self.dropout = nn.Dropout(dropout_rate)
else:
self.dropout = None
def forward(self, x):
b, n, _ = x.shape
q = self.query(x, dims = ([2], [0]))
k = self.key(x, dims = ([2], [0]))
v = self.value(x, dims = ([2], [0]))
q = q.permute(0, 2, 1, 3)
k = k.permute(0, 2, 1, 3)
v = v.permute(0, 2, 1, 3)
attn_weights = torch.matmul(q, k.transpose(-2, -1)) / self.scale
attn_weights = F.softmax(attn_weights, dim = -1)
out = torch.matmul(attn_weights, v)
out = out.permute(0, 2, 1, 3)
out = self.out(out, dims = ([2, 3], [0, 1]))
return out
class EncoderBlock(nn.Module):
def __init__(self, in_dim, mlp_dim, num_heads, dropout_rate = 0.1, attn_dropout_rate = 0.1):
super(EncoderBlock, self).__init__()
self.norm1 = nn.LayerNorm(in_dim)
self.attn = SelfAttention(in_dim, heads = num_heads, dropout_rate = attn_dropout_rate)
if dropout_rate > 0:
self.dropout = nn.Dropout(dropout_rate)
else:
self.dropout = None
self.norm2 = nn.LayerNorm(in_dim)
self.mlp = MlpBlock(in_dim, mlp_dim, in_dim, dropout_rate)
def forward(self, x):
residual = x
out = self.norm1(x)
out = self.attn(out)
if self.dropout:
out = self.dropout(out)
out += residual
residual = out
out = self.norm2(out)
out = self.mlp(out)
out += residual
return out
class Encoder(nn.Module):
def __init__(self, num_patches, emb_dim, mlp_dim, num_layers = 12, num_heads = 12, dropout_rate = 0.1, attn_dropout_rate = 0.0):
super(Encoder, self).__init__()
# positional embedding
self.pos_embedding = PositionEmbs(num_patches, emb_dim, dropout_rate)
# encoder blocks
in_dim = emb_dim
self.encoder_layers = nn.ModuleList()
for _ in range(num_layers):
layer = EncoderBlock(in_dim, mlp_dim, num_heads, dropout_rate, attn_dropout_rate)
self.encoder_layers.append(layer)
self.norm = nn.LayerNorm(in_dim)
def forward(self, x):
out = self.pos_embedding(x)
for layer in self.encoder_layers:
out = layer(out)
out = self.norm(out)
return out
class VisionTransformer(nn.Module):
""" Vision Transformer """
def __init__(self,
image_size = (256, 256),
patch_size = (16, 16),
emb_dim = 768,
mlp_dim = 3072,
num_heads = 12,
num_layers = 12,
attn_dropout_rate = 0.0,
dropout_rate = 0.1):
super(VisionTransformer, self).__init__()
h, w = image_size
# embedding layer
fh, fw = patch_size
gh, gw = h // fh, w // fw
num_patches = gh * gw
self.embedding = nn.Conv2d(3, emb_dim, kernel_size = (fh, fw), stride = (fh, fw))
# class token
self.cls_token = nn.Parameter(torch.zeros(1, 1, emb_dim))
# transformer
self.transformer = Encoder(
num_patches = num_patches,
emb_dim = emb_dim,
mlp_dim = mlp_dim,
num_layers = num_layers,
num_heads = num_heads,
dropout_rate = dropout_rate,
attn_dropout_rate = attn_dropout_rate)
def forward(self, x):
emb = self.embedding(x) # (n, c, gh, gw)
emb = emb.permute(0, 2, 3, 1) # (n, gh, hw, c)
b, h, w, c = emb.shape
emb = emb.reshape(b, h * w, c)
# prepend class token
cls_token = self.cls_token.repeat(b, 1, 1)
emb = torch.cat([cls_token, emb], dim = 1)
# transformer
feat = self.transformer(emb)
return feat
class CAFIA_Transformer(nn.Module):
def __init__(self, args):
super(CAFIA_Transformer, self).__init__()
self.vit = VisionTransformer(
image_size = (args.image_size, args.image_size),
patch_size = (args.patch_size, args.patch_size),
emb_dim = args.emb_dim,
mlp_dim = args.mlp_dim,
num_heads = args.num_heads,
num_layers = args.num_layers,
attn_dropout_rate = args.attn_dropout_rate,
dropout_rate = args.dropout_rate)
self.init_weight(args)
self.classifier = nn.Linear(args.emb_dim, args.num_classes)
def init_weight(self, args):
state_dict = torch.load(args.vit_model)['state_dict']
del state_dict['classifier.weight']
del state_dict['classifier.bias']
self.vit.load_state_dict(state_dict)
def forward(self, batch_X):
feat = self.vit(batch_X)
output = self.classifier(feat[:, 0])
return output
predict1.py
import os
import json
import argparse
import torch
from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt
from model import CAFIA_Transformer
# Get forecast results
# classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
def predict():
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
data_transform = transforms.Compose(
[transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])])
# load image
img_path = "./image/bird.png"
assert os.path.exists(img_path), "file: '{}' dose not exist.".format(img_path)
img = Image.open(img_path).convert('RGB')
#img = Image.open(img_path)
plt.imshow(img)
# [N, C, H, W]
img = data_transform(img)
# expand batch dimension
img = torch.unsqueeze(img, dim=0)
# read class_indict
json_path = './class_indices.json'
assert os.path.exists(json_path), "file: '{}' dose not exist.".format(json_path)
json_file = open(json_path, "r")
class_indict = json.load(json_file)
# create model
parser = argparse.ArgumentParser()
parser.add_argument("--output", default='./output', type=str)
parser.add_argument("--vit_model", default='./Vit_weights/imagenet21k+imagenet2012_ViT-B_16-224.pth', type=str)
parser.add_argument("--image_size", type=int, default=224, help="input image size", choices=[224, 384])
parser.add_argument("--num-classes", type=int, default=10, help="number of classes in dataset")
parser.add_argument("--patch_size", type=int, default=16)
parser.add_argument("--emb_dim", type=int, default=768)
parser.add_argument("--mlp_dim", type=int, default=3072)
parser.add_argument("--num_heads", type=int, default=12)
parser.add_argument("--num_layers", type=int, default=12)
parser.add_argument("--attn_dropout_rate", type=float, default=0.0)
parser.add_argument("--dropout_rate", type=float, default=0.1)
args = parser.parse_args()
model = CAFIA_Transformer(args)
model.to(device) #has_logits=False And training (train.py) The settings are the same
# load model weights
model_weight_path = "./output/0.9912199974060059.pt"
model.load_state_dict(torch.load(model_weight_path, map_location=device))
model.eval()
with torch.no_grad():
# predict class
output = torch.squeeze(model(img.to(device))).cpu()
predict = torch.softmax(output, dim=0)
predict_cla = torch.argmax(predict).numpy()
if predict[predict_cla].numpy() <0.5:
print_res = "class: {} prob: {:.3}".format('no match',
predict[predict_cla].numpy())
else:
print_res = "class: {} prob: {:.3}".format(class_indict[str(predict_cla)],
predict[predict_cla].numpy())
plt.title(print_res)
print(print_res)
plt.show()
if __name__ == '__main__':
predict()
5.4. result
Weights based on pre training , Through a epoch, We get 99.1 %Accuracy
6. Run a screenshot
6.1 improvement LetNet
improvement LetNet Run the main program main.py, Return as follows , Predictive dog dog, The prediction rate is 77.8%
torch.Size([3, 32, 32])
class: plane prob: 0.000458
class: car prob: 1.94e-08
class: bird prob: 0.124
class: cat prob: 0.0532
class: deer prob: 0.0414
class: dog prob: 0.778
class: frog prob: 5.11e-05
class: horse prob: 0.0019
class: ship prob: 0.000887
class: truck prob: 1.69e-07
Process finished with exit code 0
effect 
6.2 VIT+cifar10
First add a few image png picture 


function predict1.py, At this time, the default prediction picture is 502.png, Return mismatch
class: no match prob: 0.222

Change the forecast picture to bird.png, The return prediction probability is 99.6%, It's already very high !
class: bird prob: 0.996

7. Comparison and analysis of experimental results
( The results of the two methods are compared , precision 、 Iterations, etc )
| neural network | precision | Iteration time | Bach_size | The number of iterations |
|---|---|---|---|---|
| improvement LetNet5 | 76.01% | 15min | 64 | 30 |
| VIT | 99.12% | 1h15min | 4 | 1 |
8. Experience and summary
First of all, I would like to thank Mr. Cheng for giving us the pattern recognition course , Let's have a systematic understanding of pattern recognition , From the beginning of the mean and variance into the search for good inducement , Let's have a deeper understanding of patterns and recognition , At the same time, I was very moved along the way. Many groups shared wonderful network structure and experience , It is expected that neural network will promote pattern recognition !
The two kinds of neural networks in this experiment are from the network , Because you have to adapt your computer , You need to adjust the relevant parameters , There will inevitably be many bug, But it's just a little bit like this , Will also reap a lot , for example tensorboard Display of , According to the training weight, the prediction output is displayed, etc .
From the above experiments we can see that , although Transformer Architecture has become the de facto standard for naturallanguageprocessing tasks , But its application in computer vision is still limited . In vision , Attention is either used in conjunction with convolutional networks , Or it can be used to replace some components of the convolution network , While maintaining its overall structure . We show that , This pair CNN Dependency is unnecessary , The pure converter directly applied to the image block sequence can perform the image classification task well . When a large amount of data is pre trained and transmitted to multiple small and medium-sized image recognition benchmarks (ImageNet、CIFAR-10、VTAB etc. ) when , Compared with the most advanced convolutional network , Vision Converter (ViT) Excellent results can be obtained , At the same time, the computing resources required are greatly reduced .
9. experimental data
CIFAR-10, Contains 6 The resolution of 10000 pieces is 32x32 Pictures of the , It is divided into 10 class , The plane 、 automobile 、 birds 、 cat 、 deer 、 Dog 、 Frogs 、 Horse 、 Boats and trucks . There are... In the data set 50000 Training pictures and 10000 Test pictures .
reference :
Vision-Transformer-ViT
【 utilize pytorch Build improvement LeNet-5 A network model (win11) To continue 】
边栏推荐
- Methods and usage of promise async and await
- What is the function of LED backlight?
- He was the first hero of Shanghai's two major industries, but died silently in regret
- Retest the cloud native database performance: polardb is still the strongest, while tdsql-c and gaussdb have little change
- Kongsong (ICT Institute) - cloud security capacity building and trend in the digital age
- 对象映射 - Mapping.Mapster
- win10 R包安装报错:没有安装在arch=i386
- The latest collection of arouter problems
- 学习redis实现分布式锁—–自己的一个理解
- CVPR 2022 | 大幅减少零样本学习所需人工标注,马普所和北邮提出富含视觉信息的类别语义嵌入...
猜你喜欢

wallys/IPQ8074a/2x(4×4 or 8×8) 11AX MU-MIMO DUAL CONCURRENT EMBEDDEDBOARD

led背光板的作用是什么呢?

Let's talk about how to do hardware compatibility testing and quickly migrate to openeuler?

It's time for the kotlin coroutine to schedule thread switching to solve the mystery

【模式识别大作业】

“新数科技”完成数千万元A+轮融资,造一体化智能数据库云管理平台

国内首批!阿里云云原生数据湖产品通过信通院评测认证

Evaluation of IP location query interface Ⅲ

据说用了这个,老板连夜把测试开了

Who still remembers "classmate Zhang"?
随机推荐
EMC-浪涌
HMS Core音频编辑服务3D音频技术,助力打造沉浸式听觉盛宴
19年来最艰难的618,徐雷表达三个谢意
koa - 洋葱模型浅析
Oracle NetSuite 助力 TCM Bio,洞悉数据变化,让业务发展更灵活
什么是微信小程序,带你推开小程序的大门
脚本中如何'优雅'避免MySQL登录提示信息
Shutter from zero 004 button assembly
数学(快速幂)
暑假学习记录
dplyr 中的filter报错:Can‘t transform a data frame with duplicate names
自定义一个注解来获取数据库的链接
win10 R包安装报错:没有安装在arch=i386
【模式识别大作业】
Line generation (Gauss elimination method, linear basis)
Dameng data rushes to the scientific innovation board, or becomes the "first share of domestic database" in the A-share market
60 个神级 VS Code 插件!!
goto语句跳转未初始化变量:C2362
Understanding society at the age of 14 - reading notes on "happiness at work"
wallys/3×3 MIMO 802.11ac Mini PCIe Wi-Fi Module, QCA9880, 2,4GHz / 5GHzDesigned for Enterprise