当前位置:网站首页>Li Mu D2L (V) -- multilayer perceptron
Li Mu D2L (V) -- multilayer perceptron
2022-07-26 09:09:00 【madkeyboard】
List of articles
One 、 perceptron
Concept
A given input x, The weight w, And offset b, The output of the sensor is as follows . The output of perceptron is a binary problem , The output of linear regression is a real number ,Softmax If there is n A class will output n Elements , It is a multi classification problem .
The perceptron cannot fit XOR problem , Because it can only produce linear split surfaces

Training perceptron
Its solution algorithm is equivalent to using a batch size of 1 The gradient of .

Convergence theorem

Two 、 Multilayer perceptron
Why is it called multilayer ? When the goal we want to achieve cannot be achieved at one time , Just learn a simple function first , Learn another simple function , Finally, another function is used to combine the two functions .

Single hidden layer - Single category
Input is n Dimension vector , The hidden layer is m x n Matrix , Offset is long bit m Vector . The output layer is also a long bit m Vector , The offset is a scalar .

Why do we need a nonlinear activation function ? Suppose the activation function is itself , namely σ(x) = x, You can see the output o Of w2T W1 x + b’ It is still a linear function , Then it is equivalent to a single-layer perceptron .

Sigmoid Activation function

Tanh Activation function

ReLU Activation function

Multilevel classification
Input and hidden layers are the same as single classification , The difference is that the output layer is a m x k Matrix , The offset is a length of k Vector .

3、 ... and 、 Multi layer perceptron code starts from scratch
import torch
from torch import nn
from d2l import torch as d2l
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
# 1 Implement a multi-layer perceptron with a single hidden layer , contain 256 Hidden units
num_inputs, num_outputs = 784, 10
num_hiddens = 256
w1 = nn.Parameter(torch.randn(num_inputs, num_hiddens, requires_grad=True) * 0.01) # randn It is normally distributed in (0,1) The reason why I multiply by 0.01 Is to limit the scope to (0,0.01)
b1 = nn.Parameter(torch.zeros(num_hiddens, requires_grad=True))
w2 = nn.Parameter(torch.randn(num_hiddens, num_outputs, requires_grad=True) * 0.01)
b2 = nn.Parameter(torch.zeros(num_outputs, requires_grad=True))
params = [w1, b1, w2, b2]
# 2 Realization RELU Activation function
def relu(X):
a = torch.zeros_like(X)
return torch.max(X, a)
# 3 Implementation model
def net(X):
X = X.reshape((-1, num_inputs))
H = relu(X @ w1 + b1) # @ Abbreviation for matrix multiplication
return (H @ w2 + b2)
# 4 Loss
loss = nn.CrossEntropyLoss(reduction='none')
# 5 Training process
num_epochs, lr = 10, 0.1
updater = torch.optim.SGD(params, lr=lr)
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, updater)
d2l.plt.show()
From the results, we can see that it is the same as last time sofmax comparison , its loss To reduce the , But the accuracy has not improved .

Four 、 Simple implementation
import torch
from torch import nn
from d2l import torch as d2l
net = nn.Sequential(nn.Flatten(), nn.Linear(784, 256), nn.ReLU(),
nn.Linear(256, 10))
def init_weights(m):
if type(m) == nn.Linear:
nn.init.normal_(m.weight, std=0.01)
net.apply(init_weights);
batch_size, lr, num_epochs = 256, 0.1, 10
loss = nn.CrossEntropyLoss()
trainer = torch.optim.SGD(net.parameters(), lr=lr)
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)
边栏推荐
猜你喜欢
随机推荐
Clean the label folder
Database operation topic 1
[leetcode database 1050] actors and directors who have cooperated at least three times (simple question)
CSDN TOP1“一个处女座的程序猿“如何通过写作成为百万粉丝博主?
原根与NTT 五千字详解
JS closure: binding of functions to their lexical environment
Zipkin安装和使用
tornado之多进程服务
(2006,Mysql Server has gone away)问题处理
QtCreator报错:You need to set an executable in the custom run configuration.
Summary of common activation functions for deep learning
公告 | FISCO BCOS v3.0-rc4发布,新增Max版,可支撑海量交易上链
Study notes of canal
分布式跟踪系统选型与实践
2022流动式起重机司机考试题模拟考试题库模拟考试平台操作
Day06 homework - skill question 7
“No input file specified “问题的处理
第6天总结&数据库作业
Learn more about the difference between B-tree and b+tree
Pop up window in Win 11 opens with a new tab ---firefox









