当前位置:网站首页>Alexnet implements image classification of caltech101 dataset (pytorch Implementation)
Alexnet implements image classification of caltech101 dataset (pytorch Implementation)
2022-06-13 01:06:00 【Unstoppable~~~】
Catalog
Main task
- be based on PyTorch Realization AlexNet structure
- stay Caltech101 Validate on dataset
- Dataset address
Data processing
from 101 Read image data from files ( Conduct resize and RGB The transformation of , The original image data varies in size , must resize), And add 101 Class label (0-100)
def data_processor(size=65):
""" Read out the pictures in the file and organize them into data and labels common 101 class :return: """
data = []
labels = []
label_name = []
name2label = {
}
for idx, image_path in enumerate(image_paths):
name = image_path.split(os.path.sep)[-2] # Get class alias
# Read and process images
image = cv2.imread(image_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # take BGR Turn into RGB
image = cv2.resize(image, (size, size), interpolation=cv2.INTER_AREA)
data.append(image)
label_name.append(name)
data = np.array(data)
label_name = list(dict.fromkeys(label_name)) # Use the dictionary to remove duplicates
label_name = np.array(label_name)
# print(label_name)
# Generate 0-100 Class tags for Corresponding label_name File name in
for idx, name in enumerate(label_name):
name2label[name] = idx # Each category is assigned a label
for idx, image_path in enumerate(image_paths):
labels.append(name2label[image_path.split(os.path.sep)[-2]])
labels = np.array(labels)
return data, name2label, labels
Image transformation , And separate out the training set , Validation set and test set
# Define image transformations
# define transforms
train_transform = transforms.Compose(
[transforms.ToPILImage(),
# transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])])
val_transform = transforms.Compose(
[transforms.ToPILImage(),
# transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])])
# The data set is divided into training set, verification set and test set
# x_train examples: (5205, 200, 200, 3)
# x_test examples: (1736, 200, 200, 3)
# x_val examples: (1736, 200, 200, 3)
(X, x_val, Y, y_val) = train_test_split(data, labels,
test_size=0.2,
stratify=labels,
random_state=42)
(x_train, x_test, y_train, y_test) = train_test_split(X, Y,
test_size=0.25,
random_state=42)
print(f"x_train examples: {
x_train.shape}\nx_test examples: {
x_test.shape}\nx_val examples: {
x_val.shape}")
Customize a dataset class ( Inherited from dataset) Easy to load data
class ImageDataset(Dataset):
def __init__(self, images, labels=None, transforms=None):
self.X = images
self.y = labels
self.transforms = transforms
def __len__(self):
return (len(self.X))
def __getitem__(self, i):
data = self.X[i][:]
if self.transforms:
data = self.transforms(data)
if self.y is not None:
return (data, self.y[i])
else:
return data
train_data = ImageDataset(x_train, y_train, train_transform)
val_data = ImageDataset(x_val, y_val, val_transform)
test_data = ImageDataset(x_test, y_test, val_transform)
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)
Network structure
Alexnet Model from 5 Convolutions and 3 Multiple pooling Pooling layer , Among them are 3 It's a fully connected layer .AlexNet Follow LeNet The structure is similar to , But make ⽤ More convolution layers and more ⼤ To fit ⼤ Scale datasets ImageNet. It's a superficial nerve ⽹ Collaterals and deep nerves ⽹ The dividing line of collaterals .
Alexnet The network structure of is as follows :

AlexNet The advantages of :
- Use ReLU As CNN The activation function of , And verify its effect in the deeper network than Sigmoid, Successfully solved Sigmoid The problem of gradient dispersion in deep network .
- When using Dropout Ignore some neurons at random , To avoid over fitting the model . stay AlexNet It is mainly used in the last several full connection layers Dropout.
- NN Maximum pooling using overlap in . before CNN Average pooling is commonly used in ,AlexNet All use maximum pooling , Avoid the blurring effect of average pooling . also AlexNet The step size is smaller than that of the pool core , In this way, there will be overlap and coverage between the outputs of the pooling layer , Enhance the richness of features .
- LRN layer , Create a competitive mechanism for the activity of local neurons , Make the value with larger response relatively larger , And inhibit other neurons with smaller feedback , It enhances the generalization ability of the model .
- Use CUDA Accelerate the training of deep convolution network , utilize GPU Powerful parallel computing power , When dealing with neural network training, a lot of matrix operations .
according to Alexnet The structure is set up , Five convolution layers and three fully connected layers , Convolution layer is set with Relu Activate Zeng and BatchNorm2d layer , The full connection layer is set with Dropout Layer prevents over fitting , And set up init_weights adopt kaiming_normal To initialize , The effect is better. .
# Network model building
class AlexNet(nn.Module):
def __init__(self, num_class=101, init_weights=False):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 48, kernel_size=11),
# nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2), # input[3, 224, 224] output[48, 55, 55] Automatically rounding off the decimal point
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.BatchNorm2d(48),
nn.Conv2d(48, 128, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.BatchNorm2d(128),
nn.Conv2d(128, 192, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(192, 192, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(192, 128, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2), #output[128, 6, 6]
)
self.classifier = nn.Sequential(
nn.Dropout(0.5),
nn.Linear(6 * 6 * 128, 2048),
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Linear(2048, 2048),
nn.ReLU(inplace=True),
nn.Linear(2048, num_class),
)
if init_weights:
self._initialize_weights()
def forward(self, x):
x = self.features(x)
# print("x.shape", x.shape) #torch.Size([32, 128, 22, 22])
x = torch.flatten(x, start_dim=1) # Pull into a line
# print("x.flatten.shape", x.shape)
x = self.classifier(x)
return x
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') # Professor Ho's method
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01) # Normal distribution assignment
nn.init.constant_(m.bias, 0)
Training and testing
# Training process
def train(epoch):
model.train()
train_loss = 0.0
train_acc = 0
step = 1
for batch_idx, (data, label) in enumerate(train_loader):
data, label = data.to(device), label.to(device)
label = label.to(torch.int64) # Type conversion
# print("data.shape", data.shape) #torch.Size([32, 3, 200, 200])
# print("label.shape", label.shape) #torch.Size([32]) There is no need to put label Change to one-hot
optimizer.zero_grad()
outputs = model(data) #torch.Size([32, 101])
loss = F.cross_entropy(outputs, label)
# Calculate this one batch The accuracy of
acc = (outputs.argmax(dim=1) == label).sum().cpu().item() / len(labels)
loss.backward()
optimizer.step()
train_loss += loss.item()
train_acc += acc
# Average data
avg_train_acc = train_acc/step
avg_train_loss = train_loss/step
writer.add_scalars(
"Training Loss", {
"Training Loss": avg_train_loss},
epoch
)
writer.flush()
return avg_train_acc, avg_train_loss
def val():
model.eval()
train_loss = 0.0
train_acc = 0
step = 1
with torch.no_grad():
for batch_idx, (data, label) in enumerate(train_loader):
data, label = data.to(device), label.to(device)
label = label.to(torch.int64) # Type conversion
optimizer.zero_grad()
outputs = model(data) # torch.Size([32, 101])
loss = F.cross_entropy(outputs, label)
# Calculate this one batch The accuracy of
acc = (outputs.argmax(dim=1) == label).sum().cpu().item() / len(labels)
train_loss += loss.item()
train_acc += acc
# Average data
avg_train_acc = train_acc/step
avg_train_loss = train_loss/step
return avg_train_acc, avg_train_loss
All the code
alexnet.py( take tensorboard Some of the comments are released in tensorboard Draw various curves in )
from torch.nn import functional as F
from imutils import paths
import cv2
import os
import numpy as np
import torch
from torch import nn, optim
from torchvision.transforms import transforms
from torch.utils.data import DataLoader, Dataset
from sklearn.model_selection import train_test_split
from model import AlexNet
from torch.utils.tensorboard import SummaryWriter
#======================= Use tensorboard===================
writer = SummaryWriter('runs/alexnet-101-2')
#============= Parameters =======================
num_class = 101
epochs = 30
batch_size = 64
PATH = 'Xlnet.pth' # Save path of model parameters
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
path = 'caltech-101/101_ObjectCategories'
image_paths = list(paths.list_images(path)) # Return the list of all files in this directory
def data_processor(size=65):
""" Read out the pictures in the file and organize them into data and labels common 101 class :return: """
data = []
labels = []
label_name = []
name2label = {
}
for idx, image_path in enumerate(image_paths):
name = image_path.split(os.path.sep)[-2] # Get class alias
# Read and process images
image = cv2.imread(image_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # take BGR Turn into RGB
image = cv2.resize(image, (size, size), interpolation=cv2.INTER_AREA)
data.append(image)
label_name.append(name)
data = np.array(data)
label_name = list(dict.fromkeys(label_name)) # Use the dictionary to remove duplicates
label_name = np.array(label_name)
# print(label_name)
# Generate 0-100 Class tags for Corresponding label_name File name in
for idx, name in enumerate(label_name):
name2label[name] = idx # Each category is assigned a label
for idx, image_path in enumerate(image_paths):
labels.append(name2label[image_path.split(os.path.sep)[-2]])
labels = np.array(labels)
return data, name2label, labels
# return (8677, 200, 200, 3) Image data and 0-100 Tag serial number of
data, name2label, labels = data_processor()
# print(data.shape)
# print("===========================")
# print(name2label)
# print("===========================")
# print(labels)
# Define image transformations
# define transforms
train_transform = transforms.Compose(
[transforms.ToPILImage(),
# transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])])
val_transform = transforms.Compose(
[transforms.ToPILImage(),
# transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])])
# The data set is divided into training set, verification set and test set
# x_train examples: (5205, 200, 200, 3)
# x_test examples: (1736, 200, 200, 3)
# x_val examples: (1736, 200, 200, 3)
(X, x_val, Y, y_val) = train_test_split(data, labels,
test_size=0.2,
stratify=labels,
random_state=42)
(x_train, x_test, y_train, y_test) = train_test_split(X, Y,
test_size=0.25,
random_state=42)
print(f"x_train examples: {
x_train.shape}\nx_test examples: {
x_test.shape}\nx_val examples: {
x_val.shape}")
#============================== Data loading ===============================================
class ImageDataset(Dataset):
def __init__(self, images, labels=None, transforms=None):
self.X = images
self.y = labels
self.transforms = transforms
def __len__(self):
return (len(self.X))
def __getitem__(self, i):
data = self.X[i][:]
if self.transforms:
data = self.transforms(data)
if self.y is not None:
return (data, self.y[i])
else:
return data
train_data = ImageDataset(x_train, y_train, train_transform)
val_data = ImageDataset(x_val, y_val, val_transform)
test_data = ImageDataset(x_test, y_test, val_transform)
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)
#================= Build the model ==============================
model = AlexNet(init_weights=True).to(device) # Send in GPU
criterion = nn.CrossEntropyLoss() # Cross entropy loss
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=0.0005) # Stochastic gradient descent
# Training process
def train(epoch):
model.train()
train_loss = 0.0
train_acc = 0
step = 1
for batch_idx, (data, label) in enumerate(train_loader):
data, label = data.to(device), label.to(device)
label = label.to(torch.int64) # Type conversion
# print("data.shape", data.shape) #torch.Size([32, 3, 200, 200])
# print("label.shape", label.shape) #torch.Size([32]) There is no need to put label Change to one-hot
optimizer.zero_grad()
outputs = model(data) #torch.Size([32, 101])
loss = F.cross_entropy(outputs, label)
# Calculate this one batch The accuracy of
acc = (outputs.argmax(dim=1) == label).sum().cpu().item() / len(labels)
loss.backward()
optimizer.step()
train_loss += loss.item()
train_acc += acc
# Average data
avg_train_acc = train_acc/step
avg_train_loss = train_loss/step
writer.add_scalars(
"Training Loss", {
"Training Loss": avg_train_loss},
epoch
)
writer.flush()
return avg_train_acc, avg_train_loss
def val():
model.eval()
train_loss = 0.0
train_acc = 0
step = 1
with torch.no_grad():
for batch_idx, (data, label) in enumerate(train_loader):
data, label = data.to(device), label.to(device)
label = label.to(torch.int64) # Type conversion
optimizer.zero_grad()
outputs = model(data) # torch.Size([32, 101])
loss = F.cross_entropy(outputs, label)
# Calculate this one batch The accuracy of
acc = (outputs.argmax(dim=1) == label).sum().cpu().item() / len(labels)
train_loss += loss.item()
train_acc += acc
# Average data
avg_train_acc = train_acc/step
avg_train_loss = train_loss/step
return avg_train_acc, avg_train_loss
def tensorboard_draw():
# Initialize a card with all 0 Pictures of the
images = torch.zeros((1, 3, 65, 65))
# Draw the network structure diagram
writer.add_graph(model.to("cpu"), images)
writer.flush()
def select_n_random(data, labels, n=100):
assert len(data) == len(labels)
perm = torch.randperm(len(data))
return data[perm][:n], labels[perm][:n]
def run():
print('start training')
for epoch in range(epochs):
train_acc, train_loss = train(epoch)
print("EPOCH [{}/{}] Train acc {:.4f} Train loss {:.4f} ".format(epoch + 1, epochs, train_acc, train_loss))
torch.save(model.state_dict(), PATH) # Save model parameters
val_acc, val_loss = val()
print("val(): val acc {:.4f} val loss {:.4f} ".format(val_acc, val_loss))
writer.close()
run()
# tensorboard_draw(train_loader)
# tensorboard_draw2()
model.py
import torch
import os
from torch import nn
from torch.nn import functional as F
from torch.autograd import Variable
import matplotlib.pyplot as plt
from torchvision.datasets import ImageFolder
import torch.optim as optim
import torch.utils.data
# Network model building
class AlexNet(nn.Module):
def __init__(self, num_class=101, init_weights=False):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 48, kernel_size=11),
# nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2), # input[3, 224, 224] output[48, 55, 55] Automatically rounding off the decimal point
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.BatchNorm2d(48),
nn.Conv2d(48, 128, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.BatchNorm2d(128),
nn.Conv2d(128, 192, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(192, 192, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(192, 128, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2), #output[128, 6, 6]
)
self.classifier = nn.Sequential(
nn.Dropout(0.5),
nn.Linear(6 * 6 * 128, 2048),
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Linear(2048, 2048),
nn.ReLU(inplace=True),
nn.Linear(2048, num_class),
)
if init_weights:
self._initialize_weights()
def forward(self, x):
x = self.features(x)
# print("x.shape", x.shape) #torch.Size([32, 128, 22, 22])
x = torch.flatten(x, start_dim=1) # Pull into a line
# print("x.flatten.shape", x.shape)
x = self.classifier(x)
return x
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') # Professor Ho's method
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01) # Normal distribution assignment
nn.init.constant_(m.bias, 0)
Training results

Due to hardware limitations , A simplified version of Alexnet Model ( The model structure has not been changed , Only the parameter quantity is reduced , Change the picture size from 224224resize by 6565) Trained 30 Round per round batch by 64( Convergence is not achieved ), I believe there is more powerful hardware support , It works better
边栏推荐
- Common skills for quantitative investment - indicators Chapter 3: detailed explanation of RSI indicators, their code implementation and drawing
- Go simple read database
- [JS component] dazzle radio box and multi box
- [003] embedded learning: creating project templates - using stm32cubemx
- [backtrader source code analysis 7] analysis of the functions for calculating mean value, variance and standard deviation in mathsupport in backtrader (with low gold content)
- leetcode 206. Reverse linked list
- sort
- leetcode. 349. intersection of two arrays
- Mysql database password modification
- [network protocol] problems and solutions in the use of LwIP
猜你喜欢

Five classic articles worth reading

Self use notes for problem brushing learning

Key point detection data preparation and model design based on u-net Network -- detection model of four key points of industrial components

spiral matrix visit Search a 2D Matrix

Status of the thread

Et5.0 simply transform referencecollectorieditor

redis

Stmarl: a spatio temporal multi agentreinforcement learning approach for cooperative traffic

Most elements leetcode

Quantitative investment traditional index investment decision vs Monte Carlo simulation method
随机推荐
How to handle different types of data
軟件測試的幾種分類,一看就明了
Android Weather
leetcode. 349. intersection of two arrays
The tle4253gs is a monolithic integrated low dropout tracking regulator in a small pg-dso-8 package.
Jenkins持续集成操作
Leetcode-18- sum of four numbers (medium)
Remove duplicates from an ordered array
Biological unlocking - Fingerprint entry process
Get preview of precast body
Three column simple Typecho theme lanstar/ Blue Star Typecho theme
np.concatenate中axis的理解
Continue when the condition is not asked, execute the parameter you compare
Common skills for quantitative investment - drawing 2: drawing the moving average
三栏简约typecho主题Lanstar/蓝星typecho主题
Tangent and tangent plane
MySQL transaction
Canvas random bubbling background
切线与切平面
How many rounds of deep learning training? How many iterations?