当前位置：网站首页>Alexnet implements image classification of caltech101 dataset (pytorch Implementation)

Alexnet implements image classification of caltech101 dataset (pytorch Implementation)

2022-06-13 01:06:00 【Unstoppable~~~】

Catalog

Main task

be based on PyTorch Realization AlexNet structure
stay Caltech101 Validate on dataset
Dataset address

Data processing

from 101 Read image data from files （ Conduct resize and RGB The transformation of , The original image data varies in size , must resize）, And add 101 Class label （0-100）

def data_processor(size=65):
    """  Read out the pictures in the file and organize them into data and labels  common 101 class  :return: """
    data = []
    labels = []
    label_name = []
    name2label = {
    }
    for idx, image_path in enumerate(image_paths):
        name = image_path.split(os.path.sep)[-2]  # Get class alias 
        # Read and process images 
        image = cv2.imread(image_path)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # take BGR Turn into RGB
        image = cv2.resize(image, (size, size), interpolation=cv2.INTER_AREA)
        data.append(image)
        label_name.append(name)
    data = np.array(data)
    label_name = list(dict.fromkeys(label_name))  # Use the dictionary to remove duplicates 
    label_name = np.array(label_name)
    # print(label_name)
    #  Generate 0-100 Class tags for   Corresponding label_name File name in 
    for idx, name in enumerate(label_name):
        name2label[name] = idx  # Each category is assigned a label 
    for idx, image_path in enumerate(image_paths):
        labels.append(name2label[image_path.split(os.path.sep)[-2]])
    labels = np.array(labels)
    return data, name2label, labels

Image transformation , And separate out the training set , Validation set and test set

# Define image transformations 
# define transforms
train_transform = transforms.Compose(
    [transforms.ToPILImage(),
     # transforms.Resize((224, 224)),
     transforms.ToTensor(),
     transforms.Normalize(mean=[0.485, 0.456, 0.406],
                          std=[0.229, 0.224, 0.225])])
val_transform = transforms.Compose(
    [transforms.ToPILImage(),
     # transforms.Resize((224, 224)),
     transforms.ToTensor(),
     transforms.Normalize(mean=[0.485, 0.456, 0.406],
                          std=[0.229, 0.224, 0.225])])

#  The data set is divided into training set, verification set and test set 
# x_train examples: (5205, 200, 200, 3)
# x_test examples: (1736, 200, 200, 3)
# x_val examples: (1736, 200, 200, 3)
(X, x_val, Y, y_val) = train_test_split(data, labels,
                                                    test_size=0.2,
                                                    stratify=labels,
                                                    random_state=42)
(x_train, x_test, y_train, y_test) = train_test_split(X, Y,
                                                    test_size=0.25,
                                                    random_state=42)
print(f"x_train examples: {
      x_train.shape}\nx_test examples: {
      x_test.shape}\nx_val examples: {
      x_val.shape}")

Customize a dataset class （ Inherited from dataset） Easy to load data

class ImageDataset(Dataset):
    def __init__(self, images, labels=None, transforms=None):
        self.X = images
        self.y = labels
        self.transforms = transforms

    def __len__(self):
        return (len(self.X))

    def __getitem__(self, i):
        data = self.X[i][:]

        if self.transforms:
            data = self.transforms(data)

        if self.y is not None:
            return (data, self.y[i])
        else:
            return data

train_data = ImageDataset(x_train, y_train, train_transform)
val_data = ImageDataset(x_val, y_val, val_transform)
test_data = ImageDataset(x_test, y_test, val_transform)

train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)

Network structure

Alexnet Model from 5 Convolutions and 3 Multiple pooling Pooling layer , Among them are 3 It's a fully connected layer .AlexNet Follow LeNet The structure is similar to , But make ⽤ More convolution layers and more ⼤ To fit ⼤ Scale datasets ImageNet. It's a superficial nerve ⽹ Collaterals and deep nerves ⽹ The dividing line of collaterals .
Alexnet The network structure of is as follows ：

Insert picture description here
AlexNet The advantages of ：

Use ReLU As CNN The activation function of , And verify its effect in the deeper network than Sigmoid, Successfully solved Sigmoid The problem of gradient dispersion in deep network .
When using Dropout Ignore some neurons at random , To avoid over fitting the model . stay AlexNet It is mainly used in the last several full connection layers Dropout.
NN Maximum pooling using overlap in . before CNN Average pooling is commonly used in ,AlexNet All use maximum pooling , Avoid the blurring effect of average pooling . also AlexNet The step size is smaller than that of the pool core , In this way, there will be overlap and coverage between the outputs of the pooling layer , Enhance the richness of features .
LRN layer , Create a competitive mechanism for the activity of local neurons , Make the value with larger response relatively larger , And inhibit other neurons with smaller feedback , It enhances the generalization ability of the model .
Use CUDA Accelerate the training of deep convolution network , utilize GPU Powerful parallel computing power , When dealing with neural network training, a lot of matrix operations .

according to Alexnet The structure is set up , Five convolution layers and three fully connected layers , Convolution layer is set with Relu Activate Zeng and BatchNorm2d layer , The full connection layer is set with Dropout Layer prevents over fitting , And set up init_weights adopt kaiming_normal To initialize , The effect is better. .

#  Network model building 
class AlexNet(nn.Module):
    def __init__(self, num_class=101, init_weights=False):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 48, kernel_size=11),
            # nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2), # input[3, 224, 224] output[48, 55, 55]  Automatically rounding off the decimal point 
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.BatchNorm2d(48),
            nn.Conv2d(48, 128, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.BatchNorm2d(128),
            nn.Conv2d(128, 192, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(192, 192, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(192, 128, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2), #output[128, 6, 6]
        )
        self.classifier = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(6 * 6 * 128, 2048),
            nn.ReLU(inplace=True),
            nn.Dropout(0.5),
            nn.Linear(2048, 2048),
            nn.ReLU(inplace=True),
            nn.Linear(2048, num_class),
        )
        if init_weights:
            self._initialize_weights()

    def forward(self, x):
        x = self.features(x)
        # print("x.shape", x.shape) #torch.Size([32, 128, 22, 22])
        x = torch.flatten(x, start_dim=1)  # Pull into a line 
        # print("x.flatten.shape", x.shape)
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') # Professor Ho's method 
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)  # Normal distribution assignment 
                nn.init.constant_(m.bias, 0)

Training and testing

# Training process 
def train(epoch):
    model.train()
    train_loss = 0.0
    train_acc = 0
    step = 1
    for batch_idx, (data, label) in enumerate(train_loader):
        data, label = data.to(device), label.to(device)
        label = label.to(torch.int64)  # Type conversion 
        # print("data.shape", data.shape) #torch.Size([32, 3, 200, 200])
        # print("label.shape", label.shape) #torch.Size([32])  There is no need to put label Change to one-hot
        optimizer.zero_grad()
        outputs = model(data)   #torch.Size([32, 101])
        loss = F.cross_entropy(outputs, label)
        # Calculate this one batch The accuracy of 
        acc = (outputs.argmax(dim=1) == label).sum().cpu().item() / len(labels)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()
        train_acc += acc
    # Average data 
    avg_train_acc = train_acc/step
    avg_train_loss = train_loss/step
    writer.add_scalars(
        "Training Loss", {
    "Training Loss": avg_train_loss},
        epoch
    )
    writer.flush()
    return avg_train_acc, avg_train_loss

def val():
    model.eval()
    train_loss = 0.0
    train_acc = 0
    step = 1

    with torch.no_grad():
        for batch_idx, (data, label) in enumerate(train_loader):
            data, label = data.to(device), label.to(device)
            label = label.to(torch.int64)  #  Type conversion 
            optimizer.zero_grad()
            outputs = model(data)  # torch.Size([32, 101])
            loss = F.cross_entropy(outputs, label)
            #  Calculate this one batch The accuracy of 
            acc = (outputs.argmax(dim=1) == label).sum().cpu().item() / len(labels)
            train_loss += loss.item()
            train_acc += acc
    # Average data 
    avg_train_acc = train_acc/step
    avg_train_loss = train_loss/step
    return avg_train_acc, avg_train_loss

All the code

alexnet.py( take tensorboard Some of the comments are released in tensorboard Draw various curves in )

from torch.nn import functional as F
from imutils import paths
import cv2
import os
import numpy as np
import torch
from torch import nn, optim
from torchvision.transforms import transforms
from torch.utils.data import DataLoader, Dataset
from sklearn.model_selection import train_test_split
from model import AlexNet
from torch.utils.tensorboard import SummaryWriter

#======================= Use tensorboard===================
writer = SummaryWriter('runs/alexnet-101-2')

#============= Parameters =======================
num_class = 101
epochs = 30
batch_size = 64
PATH = 'Xlnet.pth'   # Save path of model parameters 
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

path = 'caltech-101/101_ObjectCategories'
image_paths = list(paths.list_images(path))  # Return the list of all files in this directory 
def data_processor(size=65):
    """  Read out the pictures in the file and organize them into data and labels  common 101 class  :return: """
    data = []
    labels = []
    label_name = []
    name2label = {
    }
    for idx, image_path in enumerate(image_paths):
        name = image_path.split(os.path.sep)[-2]  # Get class alias 
        # Read and process images 
        image = cv2.imread(image_path)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # take BGR Turn into RGB
        image = cv2.resize(image, (size, size), interpolation=cv2.INTER_AREA)
        data.append(image)
        label_name.append(name)
    data = np.array(data)
    label_name = list(dict.fromkeys(label_name))  # Use the dictionary to remove duplicates 
    label_name = np.array(label_name)
    # print(label_name)
    #  Generate 0-100 Class tags for   Corresponding label_name File name in 
    for idx, name in enumerate(label_name):
        name2label[name] = idx  # Each category is assigned a label 
    for idx, image_path in enumerate(image_paths):
        labels.append(name2label[image_path.split(os.path.sep)[-2]])
    labels = np.array(labels)
    return data, name2label, labels

# return (8677, 200, 200, 3) Image data and 0-100 Tag serial number of 
data, name2label, labels = data_processor()
# print(data.shape)
# print("===========================")
# print(name2label)
# print("===========================")
# print(labels)

# Define image transformations 
# define transforms
train_transform = transforms.Compose(
    [transforms.ToPILImage(),
     # transforms.Resize((224, 224)),
     transforms.ToTensor(),
     transforms.Normalize(mean=[0.485, 0.456, 0.406],
                          std=[0.229, 0.224, 0.225])])
val_transform = transforms.Compose(
    [transforms.ToPILImage(),
     # transforms.Resize((224, 224)),
     transforms.ToTensor(),
     transforms.Normalize(mean=[0.485, 0.456, 0.406],
                          std=[0.229, 0.224, 0.225])])

#  The data set is divided into training set, verification set and test set 
# x_train examples: (5205, 200, 200, 3)
# x_test examples: (1736, 200, 200, 3)
# x_val examples: (1736, 200, 200, 3)
(X, x_val, Y, y_val) = train_test_split(data, labels,
                                                    test_size=0.2,
                                                    stratify=labels,
                                                    random_state=42)
(x_train, x_test, y_train, y_test) = train_test_split(X, Y,
                                                    test_size=0.25,
                                                    random_state=42)
print(f"x_train examples: {
      x_train.shape}\nx_test examples: {
      x_test.shape}\nx_val examples: {
      x_val.shape}")

#============================== Data loading ===============================================
class ImageDataset(Dataset):
    def __init__(self, images, labels=None, transforms=None):
        self.X = images
        self.y = labels
        self.transforms = transforms

    def __len__(self):
        return (len(self.X))

    def __getitem__(self, i):
        data = self.X[i][:]

        if self.transforms:
            data = self.transforms(data)

        if self.y is not None:
            return (data, self.y[i])
        else:
            return data

train_data = ImageDataset(x_train, y_train, train_transform)
val_data = ImageDataset(x_val, y_val, val_transform)
test_data = ImageDataset(x_test, y_test, val_transform)

train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)
#================= Build the model ==============================
model = AlexNet(init_weights=True).to(device)   # Send in GPU
criterion = nn.CrossEntropyLoss()  #  Cross entropy loss 
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=0.0005)  #  Stochastic gradient descent 

# Training process 
def train(epoch):
    model.train()
    train_loss = 0.0
    train_acc = 0
    step = 1
    for batch_idx, (data, label) in enumerate(train_loader):
        data, label = data.to(device), label.to(device)
        label = label.to(torch.int64)  # Type conversion 
        # print("data.shape", data.shape) #torch.Size([32, 3, 200, 200])
        # print("label.shape", label.shape) #torch.Size([32])  There is no need to put label Change to one-hot
        optimizer.zero_grad()
        outputs = model(data)   #torch.Size([32, 101])
        loss = F.cross_entropy(outputs, label)
        # Calculate this one batch The accuracy of 
        acc = (outputs.argmax(dim=1) == label).sum().cpu().item() / len(labels)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()
        train_acc += acc
    # Average data 
    avg_train_acc = train_acc/step
    avg_train_loss = train_loss/step
    writer.add_scalars(
        "Training Loss", {
    "Training Loss": avg_train_loss},
        epoch
    )
    writer.flush()
    return avg_train_acc, avg_train_loss

def val():
    model.eval()
    train_loss = 0.0
    train_acc = 0
    step = 1

    with torch.no_grad():
        for batch_idx, (data, label) in enumerate(train_loader):
            data, label = data.to(device), label.to(device)
            label = label.to(torch.int64)  #  Type conversion 
            optimizer.zero_grad()
            outputs = model(data)  # torch.Size([32, 101])
            loss = F.cross_entropy(outputs, label)
            #  Calculate this one batch The accuracy of 
            acc = (outputs.argmax(dim=1) == label).sum().cpu().item() / len(labels)
            train_loss += loss.item()
            train_acc += acc
    # Average data 
    avg_train_acc = train_acc/step
    avg_train_loss = train_loss/step
    return avg_train_acc, avg_train_loss

def tensorboard_draw():
    # Initialize a card with all 0 Pictures of the 
    images = torch.zeros((1, 3, 65, 65))
    #  Draw the network structure diagram 
    writer.add_graph(model.to("cpu"), images)
    writer.flush()

def select_n_random(data, labels, n=100):
    assert len(data) == len(labels)

    perm = torch.randperm(len(data))
    return data[perm][:n], labels[perm][:n]

def run():
    print('start training')
    for epoch in range(epochs):
        train_acc, train_loss = train(epoch)
        print("EPOCH [{}/{}] Train acc {:.4f} Train loss {:.4f} ".format(epoch + 1, epochs, train_acc, train_loss))
    torch.save(model.state_dict(), PATH) # Save model parameters 
    val_acc, val_loss = val()
    print("val(): val acc {:.4f} val loss {:.4f} ".format(val_acc, val_loss))
    writer.close()
run()
# tensorboard_draw(train_loader)
# tensorboard_draw2()

model.py

import torch
import os
from torch import nn
from torch.nn import functional as F
from torch.autograd import Variable
import matplotlib.pyplot as plt
from torchvision.datasets import ImageFolder
import torch.optim as optim


import torch.utils.data
#  Network model building 
class AlexNet(nn.Module):
    def __init__(self, num_class=101, init_weights=False):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 48, kernel_size=11),
            # nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2), # input[3, 224, 224] output[48, 55, 55]  Automatically rounding off the decimal point 
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.BatchNorm2d(48),
            nn.Conv2d(48, 128, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.BatchNorm2d(128),
            nn.Conv2d(128, 192, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(192, 192, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(192, 128, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2), #output[128, 6, 6]
        )
        self.classifier = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(6 * 6 * 128, 2048),
            nn.ReLU(inplace=True),
            nn.Dropout(0.5),
            nn.Linear(2048, 2048),
            nn.ReLU(inplace=True),
            nn.Linear(2048, num_class),
        )
        if init_weights:
            self._initialize_weights()

    def forward(self, x):
        x = self.features(x)
        # print("x.shape", x.shape) #torch.Size([32, 128, 22, 22])
        x = torch.flatten(x, start_dim=1)  # Pull into a line 
        # print("x.flatten.shape", x.shape)
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') # Professor Ho's method 
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)  # Normal distribution assignment 
                nn.init.constant_(m.bias, 0)

Training results

Insert picture description here
Due to hardware limitations , A simplified version of Alexnet Model （ The model structure has not been changed , Only the parameter quantity is reduced , Change the picture size from 224224resize by 6565） Trained 30 Round per round batch by 64（ Convergence is not achieved ）, I believe there is more powerful hardware support , It works better