当前位置:网站首页>Alexnet implements image classification of caltech101 dataset (pytorch Implementation)
Alexnet implements image classification of caltech101 dataset (pytorch Implementation)
2022-06-13 01:06:00 【Unstoppable~~~】
Catalog
Main task
- be based on PyTorch Realization AlexNet structure
- stay Caltech101 Validate on dataset
- Dataset address
Data processing
from 101 Read image data from files ( Conduct resize and RGB The transformation of , The original image data varies in size , must resize), And add 101 Class label (0-100)
def data_processor(size=65):
""" Read out the pictures in the file and organize them into data and labels common 101 class :return: """
data = []
labels = []
label_name = []
name2label = {
}
for idx, image_path in enumerate(image_paths):
name = image_path.split(os.path.sep)[-2] # Get class alias
# Read and process images
image = cv2.imread(image_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # take BGR Turn into RGB
image = cv2.resize(image, (size, size), interpolation=cv2.INTER_AREA)
data.append(image)
label_name.append(name)
data = np.array(data)
label_name = list(dict.fromkeys(label_name)) # Use the dictionary to remove duplicates
label_name = np.array(label_name)
# print(label_name)
# Generate 0-100 Class tags for Corresponding label_name File name in
for idx, name in enumerate(label_name):
name2label[name] = idx # Each category is assigned a label
for idx, image_path in enumerate(image_paths):
labels.append(name2label[image_path.split(os.path.sep)[-2]])
labels = np.array(labels)
return data, name2label, labels
Image transformation , And separate out the training set , Validation set and test set
# Define image transformations
# define transforms
train_transform = transforms.Compose(
[transforms.ToPILImage(),
# transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])])
val_transform = transforms.Compose(
[transforms.ToPILImage(),
# transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])])
# The data set is divided into training set, verification set and test set
# x_train examples: (5205, 200, 200, 3)
# x_test examples: (1736, 200, 200, 3)
# x_val examples: (1736, 200, 200, 3)
(X, x_val, Y, y_val) = train_test_split(data, labels,
test_size=0.2,
stratify=labels,
random_state=42)
(x_train, x_test, y_train, y_test) = train_test_split(X, Y,
test_size=0.25,
random_state=42)
print(f"x_train examples: {
x_train.shape}\nx_test examples: {
x_test.shape}\nx_val examples: {
x_val.shape}")
Customize a dataset class ( Inherited from dataset) Easy to load data
class ImageDataset(Dataset):
def __init__(self, images, labels=None, transforms=None):
self.X = images
self.y = labels
self.transforms = transforms
def __len__(self):
return (len(self.X))
def __getitem__(self, i):
data = self.X[i][:]
if self.transforms:
data = self.transforms(data)
if self.y is not None:
return (data, self.y[i])
else:
return data
train_data = ImageDataset(x_train, y_train, train_transform)
val_data = ImageDataset(x_val, y_val, val_transform)
test_data = ImageDataset(x_test, y_test, val_transform)
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)
Network structure
Alexnet Model from 5 Convolutions and 3 Multiple pooling Pooling layer , Among them are 3 It's a fully connected layer .AlexNet Follow LeNet The structure is similar to , But make ⽤ More convolution layers and more ⼤ To fit ⼤ Scale datasets ImageNet. It's a superficial nerve ⽹ Collaterals and deep nerves ⽹ The dividing line of collaterals .
Alexnet The network structure of is as follows :

AlexNet The advantages of :
- Use ReLU As CNN The activation function of , And verify its effect in the deeper network than Sigmoid, Successfully solved Sigmoid The problem of gradient dispersion in deep network .
- When using Dropout Ignore some neurons at random , To avoid over fitting the model . stay AlexNet It is mainly used in the last several full connection layers Dropout.
- NN Maximum pooling using overlap in . before CNN Average pooling is commonly used in ,AlexNet All use maximum pooling , Avoid the blurring effect of average pooling . also AlexNet The step size is smaller than that of the pool core , In this way, there will be overlap and coverage between the outputs of the pooling layer , Enhance the richness of features .
- LRN layer , Create a competitive mechanism for the activity of local neurons , Make the value with larger response relatively larger , And inhibit other neurons with smaller feedback , It enhances the generalization ability of the model .
- Use CUDA Accelerate the training of deep convolution network , utilize GPU Powerful parallel computing power , When dealing with neural network training, a lot of matrix operations .
according to Alexnet The structure is set up , Five convolution layers and three fully connected layers , Convolution layer is set with Relu Activate Zeng and BatchNorm2d layer , The full connection layer is set with Dropout Layer prevents over fitting , And set up init_weights adopt kaiming_normal To initialize , The effect is better. .
# Network model building
class AlexNet(nn.Module):
def __init__(self, num_class=101, init_weights=False):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 48, kernel_size=11),
# nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2), # input[3, 224, 224] output[48, 55, 55] Automatically rounding off the decimal point
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.BatchNorm2d(48),
nn.Conv2d(48, 128, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.BatchNorm2d(128),
nn.Conv2d(128, 192, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(192, 192, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(192, 128, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2), #output[128, 6, 6]
)
self.classifier = nn.Sequential(
nn.Dropout(0.5),
nn.Linear(6 * 6 * 128, 2048),
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Linear(2048, 2048),
nn.ReLU(inplace=True),
nn.Linear(2048, num_class),
)
if init_weights:
self._initialize_weights()
def forward(self, x):
x = self.features(x)
# print("x.shape", x.shape) #torch.Size([32, 128, 22, 22])
x = torch.flatten(x, start_dim=1) # Pull into a line
# print("x.flatten.shape", x.shape)
x = self.classifier(x)
return x
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') # Professor Ho's method
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01) # Normal distribution assignment
nn.init.constant_(m.bias, 0)
Training and testing
# Training process
def train(epoch):
model.train()
train_loss = 0.0
train_acc = 0
step = 1
for batch_idx, (data, label) in enumerate(train_loader):
data, label = data.to(device), label.to(device)
label = label.to(torch.int64) # Type conversion
# print("data.shape", data.shape) #torch.Size([32, 3, 200, 200])
# print("label.shape", label.shape) #torch.Size([32]) There is no need to put label Change to one-hot
optimizer.zero_grad()
outputs = model(data) #torch.Size([32, 101])
loss = F.cross_entropy(outputs, label)
# Calculate this one batch The accuracy of
acc = (outputs.argmax(dim=1) == label).sum().cpu().item() / len(labels)
loss.backward()
optimizer.step()
train_loss += loss.item()
train_acc += acc
# Average data
avg_train_acc = train_acc/step
avg_train_loss = train_loss/step
writer.add_scalars(
"Training Loss", {
"Training Loss": avg_train_loss},
epoch
)
writer.flush()
return avg_train_acc, avg_train_loss
def val():
model.eval()
train_loss = 0.0
train_acc = 0
step = 1
with torch.no_grad():
for batch_idx, (data, label) in enumerate(train_loader):
data, label = data.to(device), label.to(device)
label = label.to(torch.int64) # Type conversion
optimizer.zero_grad()
outputs = model(data) # torch.Size([32, 101])
loss = F.cross_entropy(outputs, label)
# Calculate this one batch The accuracy of
acc = (outputs.argmax(dim=1) == label).sum().cpu().item() / len(labels)
train_loss += loss.item()
train_acc += acc
# Average data
avg_train_acc = train_acc/step
avg_train_loss = train_loss/step
return avg_train_acc, avg_train_loss
All the code
alexnet.py( take tensorboard Some of the comments are released in tensorboard Draw various curves in )
from torch.nn import functional as F
from imutils import paths
import cv2
import os
import numpy as np
import torch
from torch import nn, optim
from torchvision.transforms import transforms
from torch.utils.data import DataLoader, Dataset
from sklearn.model_selection import train_test_split
from model import AlexNet
from torch.utils.tensorboard import SummaryWriter
#======================= Use tensorboard===================
writer = SummaryWriter('runs/alexnet-101-2')
#============= Parameters =======================
num_class = 101
epochs = 30
batch_size = 64
PATH = 'Xlnet.pth' # Save path of model parameters
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
path = 'caltech-101/101_ObjectCategories'
image_paths = list(paths.list_images(path)) # Return the list of all files in this directory
def data_processor(size=65):
""" Read out the pictures in the file and organize them into data and labels common 101 class :return: """
data = []
labels = []
label_name = []
name2label = {
}
for idx, image_path in enumerate(image_paths):
name = image_path.split(os.path.sep)[-2] # Get class alias
# Read and process images
image = cv2.imread(image_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # take BGR Turn into RGB
image = cv2.resize(image, (size, size), interpolation=cv2.INTER_AREA)
data.append(image)
label_name.append(name)
data = np.array(data)
label_name = list(dict.fromkeys(label_name)) # Use the dictionary to remove duplicates
label_name = np.array(label_name)
# print(label_name)
# Generate 0-100 Class tags for Corresponding label_name File name in
for idx, name in enumerate(label_name):
name2label[name] = idx # Each category is assigned a label
for idx, image_path in enumerate(image_paths):
labels.append(name2label[image_path.split(os.path.sep)[-2]])
labels = np.array(labels)
return data, name2label, labels
# return (8677, 200, 200, 3) Image data and 0-100 Tag serial number of
data, name2label, labels = data_processor()
# print(data.shape)
# print("===========================")
# print(name2label)
# print("===========================")
# print(labels)
# Define image transformations
# define transforms
train_transform = transforms.Compose(
[transforms.ToPILImage(),
# transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])])
val_transform = transforms.Compose(
[transforms.ToPILImage(),
# transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])])
# The data set is divided into training set, verification set and test set
# x_train examples: (5205, 200, 200, 3)
# x_test examples: (1736, 200, 200, 3)
# x_val examples: (1736, 200, 200, 3)
(X, x_val, Y, y_val) = train_test_split(data, labels,
test_size=0.2,
stratify=labels,
random_state=42)
(x_train, x_test, y_train, y_test) = train_test_split(X, Y,
test_size=0.25,
random_state=42)
print(f"x_train examples: {
x_train.shape}\nx_test examples: {
x_test.shape}\nx_val examples: {
x_val.shape}")
#============================== Data loading ===============================================
class ImageDataset(Dataset):
def __init__(self, images, labels=None, transforms=None):
self.X = images
self.y = labels
self.transforms = transforms
def __len__(self):
return (len(self.X))
def __getitem__(self, i):
data = self.X[i][:]
if self.transforms:
data = self.transforms(data)
if self.y is not None:
return (data, self.y[i])
else:
return data
train_data = ImageDataset(x_train, y_train, train_transform)
val_data = ImageDataset(x_val, y_val, val_transform)
test_data = ImageDataset(x_test, y_test, val_transform)
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)
#================= Build the model ==============================
model = AlexNet(init_weights=True).to(device) # Send in GPU
criterion = nn.CrossEntropyLoss() # Cross entropy loss
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=0.0005) # Stochastic gradient descent
# Training process
def train(epoch):
model.train()
train_loss = 0.0
train_acc = 0
step = 1
for batch_idx, (data, label) in enumerate(train_loader):
data, label = data.to(device), label.to(device)
label = label.to(torch.int64) # Type conversion
# print("data.shape", data.shape) #torch.Size([32, 3, 200, 200])
# print("label.shape", label.shape) #torch.Size([32]) There is no need to put label Change to one-hot
optimizer.zero_grad()
outputs = model(data) #torch.Size([32, 101])
loss = F.cross_entropy(outputs, label)
# Calculate this one batch The accuracy of
acc = (outputs.argmax(dim=1) == label).sum().cpu().item() / len(labels)
loss.backward()
optimizer.step()
train_loss += loss.item()
train_acc += acc
# Average data
avg_train_acc = train_acc/step
avg_train_loss = train_loss/step
writer.add_scalars(
"Training Loss", {
"Training Loss": avg_train_loss},
epoch
)
writer.flush()
return avg_train_acc, avg_train_loss
def val():
model.eval()
train_loss = 0.0
train_acc = 0
step = 1
with torch.no_grad():
for batch_idx, (data, label) in enumerate(train_loader):
data, label = data.to(device), label.to(device)
label = label.to(torch.int64) # Type conversion
optimizer.zero_grad()
outputs = model(data) # torch.Size([32, 101])
loss = F.cross_entropy(outputs, label)
# Calculate this one batch The accuracy of
acc = (outputs.argmax(dim=1) == label).sum().cpu().item() / len(labels)
train_loss += loss.item()
train_acc += acc
# Average data
avg_train_acc = train_acc/step
avg_train_loss = train_loss/step
return avg_train_acc, avg_train_loss
def tensorboard_draw():
# Initialize a card with all 0 Pictures of the
images = torch.zeros((1, 3, 65, 65))
# Draw the network structure diagram
writer.add_graph(model.to("cpu"), images)
writer.flush()
def select_n_random(data, labels, n=100):
assert len(data) == len(labels)
perm = torch.randperm(len(data))
return data[perm][:n], labels[perm][:n]
def run():
print('start training')
for epoch in range(epochs):
train_acc, train_loss = train(epoch)
print("EPOCH [{}/{}] Train acc {:.4f} Train loss {:.4f} ".format(epoch + 1, epochs, train_acc, train_loss))
torch.save(model.state_dict(), PATH) # Save model parameters
val_acc, val_loss = val()
print("val(): val acc {:.4f} val loss {:.4f} ".format(val_acc, val_loss))
writer.close()
run()
# tensorboard_draw(train_loader)
# tensorboard_draw2()
model.py
import torch
import os
from torch import nn
from torch.nn import functional as F
from torch.autograd import Variable
import matplotlib.pyplot as plt
from torchvision.datasets import ImageFolder
import torch.optim as optim
import torch.utils.data
# Network model building
class AlexNet(nn.Module):
def __init__(self, num_class=101, init_weights=False):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 48, kernel_size=11),
# nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2), # input[3, 224, 224] output[48, 55, 55] Automatically rounding off the decimal point
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.BatchNorm2d(48),
nn.Conv2d(48, 128, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.BatchNorm2d(128),
nn.Conv2d(128, 192, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(192, 192, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(192, 128, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2), #output[128, 6, 6]
)
self.classifier = nn.Sequential(
nn.Dropout(0.5),
nn.Linear(6 * 6 * 128, 2048),
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Linear(2048, 2048),
nn.ReLU(inplace=True),
nn.Linear(2048, num_class),
)
if init_weights:
self._initialize_weights()
def forward(self, x):
x = self.features(x)
# print("x.shape", x.shape) #torch.Size([32, 128, 22, 22])
x = torch.flatten(x, start_dim=1) # Pull into a line
# print("x.flatten.shape", x.shape)
x = self.classifier(x)
return x
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') # Professor Ho's method
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01) # Normal distribution assignment
nn.init.constant_(m.bias, 0)
Training results

Due to hardware limitations , A simplified version of Alexnet Model ( The model structure has not been changed , Only the parameter quantity is reduced , Change the picture size from 224224resize by 6565) Trained 30 Round per round batch by 64( Convergence is not achieved ), I believe there is more powerful hardware support , It works better
边栏推荐
- [JS component] browse progress bar
- Pytorch's leafnode understanding
- Illustrator tutorial, how to add dashes and arrows in illustrator?
- How to choose stocks? Which indicator strategy is reliable? Quantitative analysis and comparison of strategic returns of vrsi, bbiboll, WR, bias and RSI indicators
- Mysql database password modification
- Application advantages of 5g industrial gateway in coal industry
- Kotlin collaboration, the life cycle of a job
- Today's sleep quality record 74 points
- spiral matrix visit Search a 2D Matrix
- leetcode. 151. flip the words in the string
猜你喜欢

How to choose stocks? Which indicator strategy is reliable? Quantitative analysis and comparison of DBCD, ROC, vroc, Cr and psy index strategy income

MySQL exception: com mysql. jdbc. PacketTooBigException: Packet for query is too large(4223215 > 4194304)

How many steps are appropriate for each cycle of deep learning?
![[JS component] customize the right-click menu](/img/a3/4555619db17e4c398e72c7d6b12f5d.jpg)
[JS component] customize the right-click menu

Deadlock problem summary

Kotlin coroutine withcontext switch thread

Traditional machine learning classification model predicts the rise and fall of stock prices

Addition and modification of JPA

How to choose stocks? Which indicator strategy is reliable? Quantitative analysis and comparison of strategic returns of vrsi, bbiboll, WR, bias and RSI indicators

redis
随机推荐
Leetcode-11- container with the most water (medium)
什么是 dummy change?
Et5.0 configuring Excel
MySQL异常:com.mysql.jdbc.PacketTooBigException: Packet for query is too large(4223215 > 4194304)
Leetcode-16- sum of the nearest three numbers (medium)
MySQL index
Today's sleep quality record 74 points
How to choose stocks? Which indicator strategy is reliable? Quantitative analysis and comparison of strategic returns of BBI, MTM, obv, CCI and priceosc indicators
[JS component] previous queue prompt
Tangent and tangent plane
Leetcode-13- Roman numeral to integer (simple)
Rasa对话机器人之HelpDesk (三)
Alexnet实现Caltech101数据集图像分类(pytorch实现)
707. design linked list
軟件測試的幾種分類,一看就明了
Plusieurs catégories de tests logiciels sont claires à première vue
Unity calls alertdialog
Pipeline流水线项目构建
Androi天氣
Leetcode-18- sum of four numbers (medium)