当前位置:网站首页>Alexnet implements image classification of caltech101 dataset (pytorch Implementation)
Alexnet implements image classification of caltech101 dataset (pytorch Implementation)
2022-06-13 01:06:00 【Unstoppable~~~】
Catalog
Main task
- be based on PyTorch Realization AlexNet structure
- stay Caltech101 Validate on dataset
- Dataset address
Data processing
from 101 Read image data from files ( Conduct resize and RGB The transformation of , The original image data varies in size , must resize), And add 101 Class label (0-100)
def data_processor(size=65):
""" Read out the pictures in the file and organize them into data and labels common 101 class :return: """
data = []
labels = []
label_name = []
name2label = {
}
for idx, image_path in enumerate(image_paths):
name = image_path.split(os.path.sep)[-2] # Get class alias
# Read and process images
image = cv2.imread(image_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # take BGR Turn into RGB
image = cv2.resize(image, (size, size), interpolation=cv2.INTER_AREA)
data.append(image)
label_name.append(name)
data = np.array(data)
label_name = list(dict.fromkeys(label_name)) # Use the dictionary to remove duplicates
label_name = np.array(label_name)
# print(label_name)
# Generate 0-100 Class tags for Corresponding label_name File name in
for idx, name in enumerate(label_name):
name2label[name] = idx # Each category is assigned a label
for idx, image_path in enumerate(image_paths):
labels.append(name2label[image_path.split(os.path.sep)[-2]])
labels = np.array(labels)
return data, name2label, labels
Image transformation , And separate out the training set , Validation set and test set
# Define image transformations
# define transforms
train_transform = transforms.Compose(
[transforms.ToPILImage(),
# transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])])
val_transform = transforms.Compose(
[transforms.ToPILImage(),
# transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])])
# The data set is divided into training set, verification set and test set
# x_train examples: (5205, 200, 200, 3)
# x_test examples: (1736, 200, 200, 3)
# x_val examples: (1736, 200, 200, 3)
(X, x_val, Y, y_val) = train_test_split(data, labels,
test_size=0.2,
stratify=labels,
random_state=42)
(x_train, x_test, y_train, y_test) = train_test_split(X, Y,
test_size=0.25,
random_state=42)
print(f"x_train examples: {
x_train.shape}\nx_test examples: {
x_test.shape}\nx_val examples: {
x_val.shape}")
Customize a dataset class ( Inherited from dataset) Easy to load data
class ImageDataset(Dataset):
def __init__(self, images, labels=None, transforms=None):
self.X = images
self.y = labels
self.transforms = transforms
def __len__(self):
return (len(self.X))
def __getitem__(self, i):
data = self.X[i][:]
if self.transforms:
data = self.transforms(data)
if self.y is not None:
return (data, self.y[i])
else:
return data
train_data = ImageDataset(x_train, y_train, train_transform)
val_data = ImageDataset(x_val, y_val, val_transform)
test_data = ImageDataset(x_test, y_test, val_transform)
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)
Network structure
Alexnet Model from 5 Convolutions and 3 Multiple pooling Pooling layer , Among them are 3 It's a fully connected layer .AlexNet Follow LeNet The structure is similar to , But make ⽤ More convolution layers and more ⼤ To fit ⼤ Scale datasets ImageNet. It's a superficial nerve ⽹ Collaterals and deep nerves ⽹ The dividing line of collaterals .
Alexnet The network structure of is as follows :

AlexNet The advantages of :
- Use ReLU As CNN The activation function of , And verify its effect in the deeper network than Sigmoid, Successfully solved Sigmoid The problem of gradient dispersion in deep network .
- When using Dropout Ignore some neurons at random , To avoid over fitting the model . stay AlexNet It is mainly used in the last several full connection layers Dropout.
- NN Maximum pooling using overlap in . before CNN Average pooling is commonly used in ,AlexNet All use maximum pooling , Avoid the blurring effect of average pooling . also AlexNet The step size is smaller than that of the pool core , In this way, there will be overlap and coverage between the outputs of the pooling layer , Enhance the richness of features .
- LRN layer , Create a competitive mechanism for the activity of local neurons , Make the value with larger response relatively larger , And inhibit other neurons with smaller feedback , It enhances the generalization ability of the model .
- Use CUDA Accelerate the training of deep convolution network , utilize GPU Powerful parallel computing power , When dealing with neural network training, a lot of matrix operations .
according to Alexnet The structure is set up , Five convolution layers and three fully connected layers , Convolution layer is set with Relu Activate Zeng and BatchNorm2d layer , The full connection layer is set with Dropout Layer prevents over fitting , And set up init_weights adopt kaiming_normal To initialize , The effect is better. .
# Network model building
class AlexNet(nn.Module):
def __init__(self, num_class=101, init_weights=False):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 48, kernel_size=11),
# nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2), # input[3, 224, 224] output[48, 55, 55] Automatically rounding off the decimal point
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.BatchNorm2d(48),
nn.Conv2d(48, 128, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.BatchNorm2d(128),
nn.Conv2d(128, 192, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(192, 192, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(192, 128, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2), #output[128, 6, 6]
)
self.classifier = nn.Sequential(
nn.Dropout(0.5),
nn.Linear(6 * 6 * 128, 2048),
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Linear(2048, 2048),
nn.ReLU(inplace=True),
nn.Linear(2048, num_class),
)
if init_weights:
self._initialize_weights()
def forward(self, x):
x = self.features(x)
# print("x.shape", x.shape) #torch.Size([32, 128, 22, 22])
x = torch.flatten(x, start_dim=1) # Pull into a line
# print("x.flatten.shape", x.shape)
x = self.classifier(x)
return x
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') # Professor Ho's method
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01) # Normal distribution assignment
nn.init.constant_(m.bias, 0)
Training and testing
# Training process
def train(epoch):
model.train()
train_loss = 0.0
train_acc = 0
step = 1
for batch_idx, (data, label) in enumerate(train_loader):
data, label = data.to(device), label.to(device)
label = label.to(torch.int64) # Type conversion
# print("data.shape", data.shape) #torch.Size([32, 3, 200, 200])
# print("label.shape", label.shape) #torch.Size([32]) There is no need to put label Change to one-hot
optimizer.zero_grad()
outputs = model(data) #torch.Size([32, 101])
loss = F.cross_entropy(outputs, label)
# Calculate this one batch The accuracy of
acc = (outputs.argmax(dim=1) == label).sum().cpu().item() / len(labels)
loss.backward()
optimizer.step()
train_loss += loss.item()
train_acc += acc
# Average data
avg_train_acc = train_acc/step
avg_train_loss = train_loss/step
writer.add_scalars(
"Training Loss", {
"Training Loss": avg_train_loss},
epoch
)
writer.flush()
return avg_train_acc, avg_train_loss
def val():
model.eval()
train_loss = 0.0
train_acc = 0
step = 1
with torch.no_grad():
for batch_idx, (data, label) in enumerate(train_loader):
data, label = data.to(device), label.to(device)
label = label.to(torch.int64) # Type conversion
optimizer.zero_grad()
outputs = model(data) # torch.Size([32, 101])
loss = F.cross_entropy(outputs, label)
# Calculate this one batch The accuracy of
acc = (outputs.argmax(dim=1) == label).sum().cpu().item() / len(labels)
train_loss += loss.item()
train_acc += acc
# Average data
avg_train_acc = train_acc/step
avg_train_loss = train_loss/step
return avg_train_acc, avg_train_loss
All the code
alexnet.py( take tensorboard Some of the comments are released in tensorboard Draw various curves in )
from torch.nn import functional as F
from imutils import paths
import cv2
import os
import numpy as np
import torch
from torch import nn, optim
from torchvision.transforms import transforms
from torch.utils.data import DataLoader, Dataset
from sklearn.model_selection import train_test_split
from model import AlexNet
from torch.utils.tensorboard import SummaryWriter
#======================= Use tensorboard===================
writer = SummaryWriter('runs/alexnet-101-2')
#============= Parameters =======================
num_class = 101
epochs = 30
batch_size = 64
PATH = 'Xlnet.pth' # Save path of model parameters
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
path = 'caltech-101/101_ObjectCategories'
image_paths = list(paths.list_images(path)) # Return the list of all files in this directory
def data_processor(size=65):
""" Read out the pictures in the file and organize them into data and labels common 101 class :return: """
data = []
labels = []
label_name = []
name2label = {
}
for idx, image_path in enumerate(image_paths):
name = image_path.split(os.path.sep)[-2] # Get class alias
# Read and process images
image = cv2.imread(image_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # take BGR Turn into RGB
image = cv2.resize(image, (size, size), interpolation=cv2.INTER_AREA)
data.append(image)
label_name.append(name)
data = np.array(data)
label_name = list(dict.fromkeys(label_name)) # Use the dictionary to remove duplicates
label_name = np.array(label_name)
# print(label_name)
# Generate 0-100 Class tags for Corresponding label_name File name in
for idx, name in enumerate(label_name):
name2label[name] = idx # Each category is assigned a label
for idx, image_path in enumerate(image_paths):
labels.append(name2label[image_path.split(os.path.sep)[-2]])
labels = np.array(labels)
return data, name2label, labels
# return (8677, 200, 200, 3) Image data and 0-100 Tag serial number of
data, name2label, labels = data_processor()
# print(data.shape)
# print("===========================")
# print(name2label)
# print("===========================")
# print(labels)
# Define image transformations
# define transforms
train_transform = transforms.Compose(
[transforms.ToPILImage(),
# transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])])
val_transform = transforms.Compose(
[transforms.ToPILImage(),
# transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])])
# The data set is divided into training set, verification set and test set
# x_train examples: (5205, 200, 200, 3)
# x_test examples: (1736, 200, 200, 3)
# x_val examples: (1736, 200, 200, 3)
(X, x_val, Y, y_val) = train_test_split(data, labels,
test_size=0.2,
stratify=labels,
random_state=42)
(x_train, x_test, y_train, y_test) = train_test_split(X, Y,
test_size=0.25,
random_state=42)
print(f"x_train examples: {
x_train.shape}\nx_test examples: {
x_test.shape}\nx_val examples: {
x_val.shape}")
#============================== Data loading ===============================================
class ImageDataset(Dataset):
def __init__(self, images, labels=None, transforms=None):
self.X = images
self.y = labels
self.transforms = transforms
def __len__(self):
return (len(self.X))
def __getitem__(self, i):
data = self.X[i][:]
if self.transforms:
data = self.transforms(data)
if self.y is not None:
return (data, self.y[i])
else:
return data
train_data = ImageDataset(x_train, y_train, train_transform)
val_data = ImageDataset(x_val, y_val, val_transform)
test_data = ImageDataset(x_test, y_test, val_transform)
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)
#================= Build the model ==============================
model = AlexNet(init_weights=True).to(device) # Send in GPU
criterion = nn.CrossEntropyLoss() # Cross entropy loss
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=0.0005) # Stochastic gradient descent
# Training process
def train(epoch):
model.train()
train_loss = 0.0
train_acc = 0
step = 1
for batch_idx, (data, label) in enumerate(train_loader):
data, label = data.to(device), label.to(device)
label = label.to(torch.int64) # Type conversion
# print("data.shape", data.shape) #torch.Size([32, 3, 200, 200])
# print("label.shape", label.shape) #torch.Size([32]) There is no need to put label Change to one-hot
optimizer.zero_grad()
outputs = model(data) #torch.Size([32, 101])
loss = F.cross_entropy(outputs, label)
# Calculate this one batch The accuracy of
acc = (outputs.argmax(dim=1) == label).sum().cpu().item() / len(labels)
loss.backward()
optimizer.step()
train_loss += loss.item()
train_acc += acc
# Average data
avg_train_acc = train_acc/step
avg_train_loss = train_loss/step
writer.add_scalars(
"Training Loss", {
"Training Loss": avg_train_loss},
epoch
)
writer.flush()
return avg_train_acc, avg_train_loss
def val():
model.eval()
train_loss = 0.0
train_acc = 0
step = 1
with torch.no_grad():
for batch_idx, (data, label) in enumerate(train_loader):
data, label = data.to(device), label.to(device)
label = label.to(torch.int64) # Type conversion
optimizer.zero_grad()
outputs = model(data) # torch.Size([32, 101])
loss = F.cross_entropy(outputs, label)
# Calculate this one batch The accuracy of
acc = (outputs.argmax(dim=1) == label).sum().cpu().item() / len(labels)
train_loss += loss.item()
train_acc += acc
# Average data
avg_train_acc = train_acc/step
avg_train_loss = train_loss/step
return avg_train_acc, avg_train_loss
def tensorboard_draw():
# Initialize a card with all 0 Pictures of the
images = torch.zeros((1, 3, 65, 65))
# Draw the network structure diagram
writer.add_graph(model.to("cpu"), images)
writer.flush()
def select_n_random(data, labels, n=100):
assert len(data) == len(labels)
perm = torch.randperm(len(data))
return data[perm][:n], labels[perm][:n]
def run():
print('start training')
for epoch in range(epochs):
train_acc, train_loss = train(epoch)
print("EPOCH [{}/{}] Train acc {:.4f} Train loss {:.4f} ".format(epoch + 1, epochs, train_acc, train_loss))
torch.save(model.state_dict(), PATH) # Save model parameters
val_acc, val_loss = val()
print("val(): val acc {:.4f} val loss {:.4f} ".format(val_acc, val_loss))
writer.close()
run()
# tensorboard_draw(train_loader)
# tensorboard_draw2()
model.py
import torch
import os
from torch import nn
from torch.nn import functional as F
from torch.autograd import Variable
import matplotlib.pyplot as plt
from torchvision.datasets import ImageFolder
import torch.optim as optim
import torch.utils.data
# Network model building
class AlexNet(nn.Module):
def __init__(self, num_class=101, init_weights=False):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 48, kernel_size=11),
# nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2), # input[3, 224, 224] output[48, 55, 55] Automatically rounding off the decimal point
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.BatchNorm2d(48),
nn.Conv2d(48, 128, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.BatchNorm2d(128),
nn.Conv2d(128, 192, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(192, 192, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(192, 128, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2), #output[128, 6, 6]
)
self.classifier = nn.Sequential(
nn.Dropout(0.5),
nn.Linear(6 * 6 * 128, 2048),
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Linear(2048, 2048),
nn.ReLU(inplace=True),
nn.Linear(2048, num_class),
)
if init_weights:
self._initialize_weights()
def forward(self, x):
x = self.features(x)
# print("x.shape", x.shape) #torch.Size([32, 128, 22, 22])
x = torch.flatten(x, start_dim=1) # Pull into a line
# print("x.flatten.shape", x.shape)
x = self.classifier(x)
return x
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') # Professor Ho's method
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01) # Normal distribution assignment
nn.init.constant_(m.bias, 0)
Training results

Due to hardware limitations , A simplified version of Alexnet Model ( The model structure has not been changed , Only the parameter quantity is reduced , Change the picture size from 224224resize by 6565) Trained 30 Round per round batch by 64( Convergence is not achieved ), I believe there is more powerful hardware support , It works better
边栏推荐
- Argparse command line passes list type parameter
- MySQL transaction
- 关于#数据库#的问题,如何解决?
- Rasa对话机器人之HelpDesk (三)
- Et5.0 configuring Excel
- 五篇经典好文,值得一看(2)
- spiral matrix visit Search a 2D Matrix
- 3623. Merge two ordered arrays
- Several categories of software testing are clear at a glance
- (01). Net Maui actual construction project
猜你喜欢

Ecological convergence NFT attacks, metaverse ape leads the new paradigm revolution of Web 3.0 meta universe

HashSet underlying source code

Today's sleep quality record 74 points

在国企做软件测试工程师是一种什么样的体验:每天过的像打仗一样

spiral matrix visit Search a 2D Matrix

Leetcode-19- delete the penultimate node of the linked list (medium)
![[JS component] dazzle radio box and multi box](/img/2a/00620bee312972db93e1db4313385f.jpg)
[JS component] dazzle radio box and multi box

Most elements leetcode

Traditional machine learning classification model predicts the rise and fall of stock prices

Illustrator tutorial, how to add dashes and arrows in illustrator?
随机推荐
How many rounds of deep learning training? How many iterations?
Continue when the condition is not asked, execute the parameter you compare
About_ int128
Et5.0 value type generation
[003] embedded learning: creating project templates - using stm32cubemx
Most elements leetcode
Leetcode-9-palindromes (simple)
The tle4253gs is a monolithic integrated low dropout tracking regulator in a small pg-dso-8 package.
Binary tree traversal - recursive and iterative templates
[JS component] browse progress bar
(01). Net Maui actual construction project
304. Merge two ordered arrays
Characteristics of transactions -- atomicity (implementation principle)
Common skills for quantitative investment - drawing 3: drawing the golden section line
Higherhrnet pre training model -- download from network disk
[backtrader source code analysis 7] analysis of the functions for calculating mean value, variance and standard deviation in mathsupport in backtrader (with low gold content)
Aof persistence
Alexnet实现Caltech101数据集图像分类(pytorch实现)
Leetcode-14- longest common prefix (simple)
[JS component] calendar