当前位置：网站首页>14. Example - Multi classification problem

14. Example - Multi classification problem

2022-07-27 05:59:00 【Pie star's favorite spongebob】

Catalog

Network Architecture
Train
Complete code

Using cross entropy loss To optimize multi classification problems

Network Architecture

The output is 10 layer , Represents the 10 classification .
Because I haven't learned the knowledge of linear layer , So some low-level operations are used to replace .

Create three new linear layers , Each linear layer has w and btensor
Pay attention to pytorch in , The first dimension is out, The second dimension is in

w1,b1 = torch.randn(200,784,requires_grad=True),torch.zeros(200,requires_grad=True)
w2,b2 = torch.randn(200,200,requires_grad=True),torch.zeros(200,requires_grad=True)
w3,b3 = torch.randn(10,200,requires_grad=True),torch.zeros(10,requires_grad=True)

The first linear layer can be understood as 784（28×28） Reduce dimension into 200. We must designate requires_grad by true, Otherwise, an error will be reported .

The second hidden layer is from 200 To 200, It's just one. feature The process of transformation , No dimensionality reduction .

The third output layer , The final output 10 Classes .

def forward(x):
    x = x @ w1.t() + b1
    x = F.relu(x)
    x = x @ w2.t() + b2
    x = F.relu(x)
    x = x @ w3.t() + b3
    x = F.relu(x)
    return x

The last layer can also be left unused relu, Out of commission sigmod perhaps softmax, Because it will be used later softmax

This is the network tensor and forward The process ,

Train

Next, define an optimizer , The goal of optimization is 3 Set variables of the whole connection layer [w1,b1,w2,b2,w3,b3]

optimizer=optim.SGD([w1,b1,w2,b2,w3,b3],lr=learning_rate)
criteon=nn.CrossEntropyLoss()

crossEntropyLoss and F.crossEntropyLoss Function as , All are softmax+log+nll_loss

for epoch in range(epochs):
    for batch_idx,(data,target) in enumerate(train_loader):
        data=data.view(-1,28*28)

        logits=forward(data)
        loss=criteon(logits,target)

        optimizer.zero_grad()
        loss.backward()

        optimizer.step()

        if batch_idx % 100 == 0:
            print('Train epoch:{}[{}/{} ({:.0f}%)]\tLoss:{:.6f}'.format(
                epoch,batch_idx*len(data),len(train_loader.dataset),
                100.*batch_idx/len(train_loader),loss.item()))

step It means a batch, and eopch Is the entire data set .
Insert picture description here

Insert picture description here
loss keep 10% unchanged , Because of initialization problems
When we are right w1,w2,w3 After the initialization ,b It's direct torch.zeros Initialized .

torch.nn.init.kaiming_normal(w1)
torch.nn.init.kaiming_normal(w2)
torch.nn.init.kaiming_normal(w3)

Insert picture description here

Complete code

import torch
import torch.nn.functional as F
from torch import optim
from torch import nn
import torchvision  

batch_size = 200
learning_rate=0.01
epochs=10
train_loader = torch.utils.data.DataLoader(
    torchvision.datasets.MNIST('mnist_data', train=True, download=True,
                               transform=torchvision.transforms.Compose([
                                   torchvision.transforms.ToTensor(),
                                   torchvision.transforms.Normalize(
                                       (0.1307,), (0.3081,))
                               ])),
    batch_size=batch_size, shuffle=True)
#  hold numpy Format to tensor
#  Regularization , stay 0 near , Can improve performance 
test_loader = torch.utils.data.DataLoader(
    torchvision.datasets.MNIST('mnist_data/', train=False, download=True,
                               transform=torchvision.transforms.Compose([
                                   torchvision.transforms.ToTensor(),
                                   torchvision.transforms.Normalize(
                                       (0.1307,), (0.3081,))
                               ])),
    batch_size=batch_size, shuffle=False)


w1,b1 = torch.randn(200,784,requires_grad=True),torch.zeros(200,requires_grad=True)
w2,b2 = torch.randn(200,200,requires_grad=True),torch.zeros(200,requires_grad=True)
w3,b3 = torch.randn(10,200,requires_grad=True),torch.zeros(10,requires_grad=True)

torch.nn.init.kaiming_normal(w1)
torch.nn.init.kaiming_normal(w2)
torch.nn.init.kaiming_normal(w3)

def forward(x):
    x = x @ w1.t() + b1
    x = F.relu(x)
    x = x @ w2.t() + b2
    x = F.relu(x)
    x = x @ w3.t() + b3
    x = F.relu(x)
    return x

optimizer=optim.SGD([w1,b1,w2,b2,w3,b3],lr=learning_rate)
criteon=nn.CrossEntropyLoss()

for epoch in range(epochs):
    for batch_idx,(data,target) in enumerate(train_loader):
        data=data.view(-1,28*28)

        logits=forward(data)
        loss=criteon(logits,target)

        optimizer.zero_grad()
        loss.backward()

        optimizer.step()

        if batch_idx % 100 == 0:
            print('Train epoch:{}[{}/{} ({:.0f}%)]\tLoss:{:.6f}'.format(
                epoch,batch_idx*len(data),len(train_loader.dataset),
                100.*batch_idx/len(train_loader),loss.item()))

    test_loss=0
    correct=0
    for data,target in test_loader:
        data=data.view(-1,28*28)
        logits=forward(data)
        test_loss+=criteon(logits,target).item()

        pred =logits.data.max(1)[1]
        correct+=pred.eq(target.data).sum()

    test_loss/=len(test_loader.dataset)
    print('\nTest set:Average loss:{:.4f},Accuracy:{}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))