当前位置:网站首页>Alexnet implements image classification of caltech101 dataset (pytorch Implementation)
Alexnet implements image classification of caltech101 dataset (pytorch Implementation)
2022-06-13 01:06:00 【Unstoppable~~~】
Catalog
Main task
- be based on PyTorch Realization AlexNet structure
- stay Caltech101 Validate on dataset
- Dataset address
Data processing
from 101 Read image data from files ( Conduct resize and RGB The transformation of , The original image data varies in size , must resize), And add 101 Class label (0-100)
def data_processor(size=65):
""" Read out the pictures in the file and organize them into data and labels common 101 class :return: """
data = []
labels = []
label_name = []
name2label = {
}
for idx, image_path in enumerate(image_paths):
name = image_path.split(os.path.sep)[-2] # Get class alias
# Read and process images
image = cv2.imread(image_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # take BGR Turn into RGB
image = cv2.resize(image, (size, size), interpolation=cv2.INTER_AREA)
data.append(image)
label_name.append(name)
data = np.array(data)
label_name = list(dict.fromkeys(label_name)) # Use the dictionary to remove duplicates
label_name = np.array(label_name)
# print(label_name)
# Generate 0-100 Class tags for Corresponding label_name File name in
for idx, name in enumerate(label_name):
name2label[name] = idx # Each category is assigned a label
for idx, image_path in enumerate(image_paths):
labels.append(name2label[image_path.split(os.path.sep)[-2]])
labels = np.array(labels)
return data, name2label, labels
Image transformation , And separate out the training set , Validation set and test set
# Define image transformations
# define transforms
train_transform = transforms.Compose(
[transforms.ToPILImage(),
# transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])])
val_transform = transforms.Compose(
[transforms.ToPILImage(),
# transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])])
# The data set is divided into training set, verification set and test set
# x_train examples: (5205, 200, 200, 3)
# x_test examples: (1736, 200, 200, 3)
# x_val examples: (1736, 200, 200, 3)
(X, x_val, Y, y_val) = train_test_split(data, labels,
test_size=0.2,
stratify=labels,
random_state=42)
(x_train, x_test, y_train, y_test) = train_test_split(X, Y,
test_size=0.25,
random_state=42)
print(f"x_train examples: {
x_train.shape}\nx_test examples: {
x_test.shape}\nx_val examples: {
x_val.shape}")
Customize a dataset class ( Inherited from dataset) Easy to load data
class ImageDataset(Dataset):
def __init__(self, images, labels=None, transforms=None):
self.X = images
self.y = labels
self.transforms = transforms
def __len__(self):
return (len(self.X))
def __getitem__(self, i):
data = self.X[i][:]
if self.transforms:
data = self.transforms(data)
if self.y is not None:
return (data, self.y[i])
else:
return data
train_data = ImageDataset(x_train, y_train, train_transform)
val_data = ImageDataset(x_val, y_val, val_transform)
test_data = ImageDataset(x_test, y_test, val_transform)
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)
Network structure
Alexnet Model from 5 Convolutions and 3 Multiple pooling Pooling layer , Among them are 3 It's a fully connected layer .AlexNet Follow LeNet The structure is similar to , But make ⽤ More convolution layers and more ⼤ To fit ⼤ Scale datasets ImageNet. It's a superficial nerve ⽹ Collaterals and deep nerves ⽹ The dividing line of collaterals .
Alexnet The network structure of is as follows :

AlexNet The advantages of :
- Use ReLU As CNN The activation function of , And verify its effect in the deeper network than Sigmoid, Successfully solved Sigmoid The problem of gradient dispersion in deep network .
- When using Dropout Ignore some neurons at random , To avoid over fitting the model . stay AlexNet It is mainly used in the last several full connection layers Dropout.
- NN Maximum pooling using overlap in . before CNN Average pooling is commonly used in ,AlexNet All use maximum pooling , Avoid the blurring effect of average pooling . also AlexNet The step size is smaller than that of the pool core , In this way, there will be overlap and coverage between the outputs of the pooling layer , Enhance the richness of features .
- LRN layer , Create a competitive mechanism for the activity of local neurons , Make the value with larger response relatively larger , And inhibit other neurons with smaller feedback , It enhances the generalization ability of the model .
- Use CUDA Accelerate the training of deep convolution network , utilize GPU Powerful parallel computing power , When dealing with neural network training, a lot of matrix operations .
according to Alexnet The structure is set up , Five convolution layers and three fully connected layers , Convolution layer is set with Relu Activate Zeng and BatchNorm2d layer , The full connection layer is set with Dropout Layer prevents over fitting , And set up init_weights adopt kaiming_normal To initialize , The effect is better. .
# Network model building
class AlexNet(nn.Module):
def __init__(self, num_class=101, init_weights=False):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 48, kernel_size=11),
# nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2), # input[3, 224, 224] output[48, 55, 55] Automatically rounding off the decimal point
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.BatchNorm2d(48),
nn.Conv2d(48, 128, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.BatchNorm2d(128),
nn.Conv2d(128, 192, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(192, 192, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(192, 128, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2), #output[128, 6, 6]
)
self.classifier = nn.Sequential(
nn.Dropout(0.5),
nn.Linear(6 * 6 * 128, 2048),
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Linear(2048, 2048),
nn.ReLU(inplace=True),
nn.Linear(2048, num_class),
)
if init_weights:
self._initialize_weights()
def forward(self, x):
x = self.features(x)
# print("x.shape", x.shape) #torch.Size([32, 128, 22, 22])
x = torch.flatten(x, start_dim=1) # Pull into a line
# print("x.flatten.shape", x.shape)
x = self.classifier(x)
return x
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') # Professor Ho's method
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01) # Normal distribution assignment
nn.init.constant_(m.bias, 0)
Training and testing
# Training process
def train(epoch):
model.train()
train_loss = 0.0
train_acc = 0
step = 1
for batch_idx, (data, label) in enumerate(train_loader):
data, label = data.to(device), label.to(device)
label = label.to(torch.int64) # Type conversion
# print("data.shape", data.shape) #torch.Size([32, 3, 200, 200])
# print("label.shape", label.shape) #torch.Size([32]) There is no need to put label Change to one-hot
optimizer.zero_grad()
outputs = model(data) #torch.Size([32, 101])
loss = F.cross_entropy(outputs, label)
# Calculate this one batch The accuracy of
acc = (outputs.argmax(dim=1) == label).sum().cpu().item() / len(labels)
loss.backward()
optimizer.step()
train_loss += loss.item()
train_acc += acc
# Average data
avg_train_acc = train_acc/step
avg_train_loss = train_loss/step
writer.add_scalars(
"Training Loss", {
"Training Loss": avg_train_loss},
epoch
)
writer.flush()
return avg_train_acc, avg_train_loss
def val():
model.eval()
train_loss = 0.0
train_acc = 0
step = 1
with torch.no_grad():
for batch_idx, (data, label) in enumerate(train_loader):
data, label = data.to(device), label.to(device)
label = label.to(torch.int64) # Type conversion
optimizer.zero_grad()
outputs = model(data) # torch.Size([32, 101])
loss = F.cross_entropy(outputs, label)
# Calculate this one batch The accuracy of
acc = (outputs.argmax(dim=1) == label).sum().cpu().item() / len(labels)
train_loss += loss.item()
train_acc += acc
# Average data
avg_train_acc = train_acc/step
avg_train_loss = train_loss/step
return avg_train_acc, avg_train_loss
All the code
alexnet.py( take tensorboard Some of the comments are released in tensorboard Draw various curves in )
from torch.nn import functional as F
from imutils import paths
import cv2
import os
import numpy as np
import torch
from torch import nn, optim
from torchvision.transforms import transforms
from torch.utils.data import DataLoader, Dataset
from sklearn.model_selection import train_test_split
from model import AlexNet
from torch.utils.tensorboard import SummaryWriter
#======================= Use tensorboard===================
writer = SummaryWriter('runs/alexnet-101-2')
#============= Parameters =======================
num_class = 101
epochs = 30
batch_size = 64
PATH = 'Xlnet.pth' # Save path of model parameters
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
path = 'caltech-101/101_ObjectCategories'
image_paths = list(paths.list_images(path)) # Return the list of all files in this directory
def data_processor(size=65):
""" Read out the pictures in the file and organize them into data and labels common 101 class :return: """
data = []
labels = []
label_name = []
name2label = {
}
for idx, image_path in enumerate(image_paths):
name = image_path.split(os.path.sep)[-2] # Get class alias
# Read and process images
image = cv2.imread(image_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # take BGR Turn into RGB
image = cv2.resize(image, (size, size), interpolation=cv2.INTER_AREA)
data.append(image)
label_name.append(name)
data = np.array(data)
label_name = list(dict.fromkeys(label_name)) # Use the dictionary to remove duplicates
label_name = np.array(label_name)
# print(label_name)
# Generate 0-100 Class tags for Corresponding label_name File name in
for idx, name in enumerate(label_name):
name2label[name] = idx # Each category is assigned a label
for idx, image_path in enumerate(image_paths):
labels.append(name2label[image_path.split(os.path.sep)[-2]])
labels = np.array(labels)
return data, name2label, labels
# return (8677, 200, 200, 3) Image data and 0-100 Tag serial number of
data, name2label, labels = data_processor()
# print(data.shape)
# print("===========================")
# print(name2label)
# print("===========================")
# print(labels)
# Define image transformations
# define transforms
train_transform = transforms.Compose(
[transforms.ToPILImage(),
# transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])])
val_transform = transforms.Compose(
[transforms.ToPILImage(),
# transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])])
# The data set is divided into training set, verification set and test set
# x_train examples: (5205, 200, 200, 3)
# x_test examples: (1736, 200, 200, 3)
# x_val examples: (1736, 200, 200, 3)
(X, x_val, Y, y_val) = train_test_split(data, labels,
test_size=0.2,
stratify=labels,
random_state=42)
(x_train, x_test, y_train, y_test) = train_test_split(X, Y,
test_size=0.25,
random_state=42)
print(f"x_train examples: {
x_train.shape}\nx_test examples: {
x_test.shape}\nx_val examples: {
x_val.shape}")
#============================== Data loading ===============================================
class ImageDataset(Dataset):
def __init__(self, images, labels=None, transforms=None):
self.X = images
self.y = labels
self.transforms = transforms
def __len__(self):
return (len(self.X))
def __getitem__(self, i):
data = self.X[i][:]
if self.transforms:
data = self.transforms(data)
if self.y is not None:
return (data, self.y[i])
else:
return data
train_data = ImageDataset(x_train, y_train, train_transform)
val_data = ImageDataset(x_val, y_val, val_transform)
test_data = ImageDataset(x_test, y_test, val_transform)
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)
#================= Build the model ==============================
model = AlexNet(init_weights=True).to(device) # Send in GPU
criterion = nn.CrossEntropyLoss() # Cross entropy loss
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=0.0005) # Stochastic gradient descent
# Training process
def train(epoch):
model.train()
train_loss = 0.0
train_acc = 0
step = 1
for batch_idx, (data, label) in enumerate(train_loader):
data, label = data.to(device), label.to(device)
label = label.to(torch.int64) # Type conversion
# print("data.shape", data.shape) #torch.Size([32, 3, 200, 200])
# print("label.shape", label.shape) #torch.Size([32]) There is no need to put label Change to one-hot
optimizer.zero_grad()
outputs = model(data) #torch.Size([32, 101])
loss = F.cross_entropy(outputs, label)
# Calculate this one batch The accuracy of
acc = (outputs.argmax(dim=1) == label).sum().cpu().item() / len(labels)
loss.backward()
optimizer.step()
train_loss += loss.item()
train_acc += acc
# Average data
avg_train_acc = train_acc/step
avg_train_loss = train_loss/step
writer.add_scalars(
"Training Loss", {
"Training Loss": avg_train_loss},
epoch
)
writer.flush()
return avg_train_acc, avg_train_loss
def val():
model.eval()
train_loss = 0.0
train_acc = 0
step = 1
with torch.no_grad():
for batch_idx, (data, label) in enumerate(train_loader):
data, label = data.to(device), label.to(device)
label = label.to(torch.int64) # Type conversion
optimizer.zero_grad()
outputs = model(data) # torch.Size([32, 101])
loss = F.cross_entropy(outputs, label)
# Calculate this one batch The accuracy of
acc = (outputs.argmax(dim=1) == label).sum().cpu().item() / len(labels)
train_loss += loss.item()
train_acc += acc
# Average data
avg_train_acc = train_acc/step
avg_train_loss = train_loss/step
return avg_train_acc, avg_train_loss
def tensorboard_draw():
# Initialize a card with all 0 Pictures of the
images = torch.zeros((1, 3, 65, 65))
# Draw the network structure diagram
writer.add_graph(model.to("cpu"), images)
writer.flush()
def select_n_random(data, labels, n=100):
assert len(data) == len(labels)
perm = torch.randperm(len(data))
return data[perm][:n], labels[perm][:n]
def run():
print('start training')
for epoch in range(epochs):
train_acc, train_loss = train(epoch)
print("EPOCH [{}/{}] Train acc {:.4f} Train loss {:.4f} ".format(epoch + 1, epochs, train_acc, train_loss))
torch.save(model.state_dict(), PATH) # Save model parameters
val_acc, val_loss = val()
print("val(): val acc {:.4f} val loss {:.4f} ".format(val_acc, val_loss))
writer.close()
run()
# tensorboard_draw(train_loader)
# tensorboard_draw2()
model.py
import torch
import os
from torch import nn
from torch.nn import functional as F
from torch.autograd import Variable
import matplotlib.pyplot as plt
from torchvision.datasets import ImageFolder
import torch.optim as optim
import torch.utils.data
# Network model building
class AlexNet(nn.Module):
def __init__(self, num_class=101, init_weights=False):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 48, kernel_size=11),
# nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2), # input[3, 224, 224] output[48, 55, 55] Automatically rounding off the decimal point
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.BatchNorm2d(48),
nn.Conv2d(48, 128, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.BatchNorm2d(128),
nn.Conv2d(128, 192, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(192, 192, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(192, 128, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2), #output[128, 6, 6]
)
self.classifier = nn.Sequential(
nn.Dropout(0.5),
nn.Linear(6 * 6 * 128, 2048),
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Linear(2048, 2048),
nn.ReLU(inplace=True),
nn.Linear(2048, num_class),
)
if init_weights:
self._initialize_weights()
def forward(self, x):
x = self.features(x)
# print("x.shape", x.shape) #torch.Size([32, 128, 22, 22])
x = torch.flatten(x, start_dim=1) # Pull into a line
# print("x.flatten.shape", x.shape)
x = self.classifier(x)
return x
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') # Professor Ho's method
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01) # Normal distribution assignment
nn.init.constant_(m.bias, 0)
Training results

Due to hardware limitations , A simplified version of Alexnet Model ( The model structure has not been changed , Only the parameter quantity is reduced , Change the picture size from 224224resize by 6565) Trained 30 Round per round batch by 64( Convergence is not achieved ), I believe there is more powerful hardware support , It works better
边栏推荐
- 蓝桥杯单片机第七届决赛
- Androi weather
- Three column simple Typecho theme lanstar/ Blue Star Typecho theme
- Illustrator tutorial, how to add dashes and arrows in illustrator?
- [JS component] create a custom horizontal and vertical scroll bar following the steam style
- [JS component] simulation framework
- Leetcode-9-palindromes (simple)
- [JS component] floating text
- 5G工业网关在煤矿行业的应用优势
- leetcode 206. Reverse linked list
猜你喜欢

Leetcode-11- container with the most water (medium)

Plusieurs catégories de tests logiciels sont claires à première vue

Kotlin coroutine suspend function suspend keyword

Ecological convergence NFT attacks, metaverse ape leads the new paradigm revolution of Web 3.0 meta universe

Opencv desaturation

Application advantages of 5g industrial gateway in coal industry

Common skills for quantitative investment - indicators Chapter 3: detailed explanation of RSI indicators, their code implementation and drawing

How to choose stocks? Which indicator strategy is reliable? Quantitative analysis and comparison of strategic benefits of ASI, VR, arbr, DPO and trix indicators

The scope builder coroutinescope, runblocking and supervisorscope of kotlin collaboration processes run synchronously. How can other collaboration processes not be suspended when the collaboration pro
![[JS component] customize the right-click menu](/img/a3/4555619db17e4c398e72c7d6b12f5d.jpg)
[JS component] customize the right-click menu
随机推荐
How to choose stocks? Which indicator strategy is reliable? Quantitative analysis and comparison of strategic returns of vrsi, bbiboll, WR, bias and RSI indicators
[server data recovery] successful cases of data loss recovery during data migration between storage servers
[JS component] previous queue prompt
The tle4253gs is a monolithic integrated low dropout tracking regulator in a small pg-dso-8 package.
Androi weather
Pipeline流水线项目构建
论文笔记:STMARL: A Spatio-Temporal Multi-AgentReinforcement Learning Approach for Cooperative Traffic
Canvas random bubbling background
Quantitative investment traditional index investment decision vs Monte Carlo simulation method
Stmarl: a spatio temporal multi agentreinforcement learning approach for cooperative traffic
ArrayList underlying source code
Addition and modification of JPA
关于#数据库#的问题,如何解决?
Characteristics of transactions -- atomicity (implementation principle)
Rasa对话机器人之HelpDesk (三)
軟件測試的幾種分類,一看就明了
Pysmb usage
Kotlin coroutine suspend function suspend keyword
MySQL异常:com.mysql.jdbc.PacketTooBigException: Packet for query is too large(4223215 > 4194304)
Characteristics of transactions - persistence (implementation principle)