当前位置:网站首页>Hands on deep learning -- implementation of multi-layer perceptron from scratch and its concise implementation
Hands on deep learning -- implementation of multi-layer perceptron from scratch and its concise implementation
2022-06-12 08:14:00 【Orange acridine 21】
Multi layer perceptron is realized from scratch
# First, you need to import the required packages
import torch
import numpy as np
import sys
sys.path.append("..")
import d2lzh_pytorch as d2l
"""
1、 get data
Use FashionMNIST Data sets , Use a multi-layer perceptron to start classifying images
"""
batch_size=256 # The batch size is set to 256, That is, every time you read 256 A picture
train_iter,test_iter =d2l.load_data_fashion_mnist(batch_size)
# Set up an iterator for training set and test set
"""
2、 Define model parameters
stay softmax Regression starts from zero , mention Fashion-MNIST The image shape in the dataset is 28x28, The number of categories is 10.
This section still uses a length of 28x28=784 The vector of represents each image .
"""
num_inputs,num_outputs,num_hiddens=784,10,256
# Enter the number 784, Number of outputs 10 individual , Number of hyperparameter hidden cells 256
W1=torch.tensor(np.random.normal(0,0.01,(num_inputs,num_hiddens)),dtype=torch.float)
# The weight W1 Initialize to a value of Gaussian random distribution , The mean for 0, The variance of 0.01, Input layer
b1=torch.zeros(num_hiddens,dtype=torch.float)
# deviation b1 Is the number of hidden layers , Define data types
W2=torch.tensor(np.random.normal(0,0.1,(num_hiddens,num_outputs)),dtype=torch.float)
# The weight W2 Initialize to a value of Gaussian random distribution , The mean for 0, The variance of 0.01, Output layer
b2=torch.zeros(num_outputs,dtype=torch.float)
# deviation b2 Is to grow into 10 A vector of , Define data types
# The following is the gradient of model parameters , Represents each weight w And deviation b You need to find the gradient
params=[W1,b1,W2,b2]
for param in params:
param.requires_grad_(requires_grad=True)
"""
3、 Define activation function
Use here ReLU Activation function , Use basic max Function implementation ReLU, Instead of calling directly
"""
def relu(X):
return torch.max(input=X,other=torch.tensor(0.0))
# Input to X
"""
4、 Defining models
Implement the calculation expression of multi-layer perceptron in the previous section
"""
def net(X):
X=X.view((-1,num_inputs))
# Use view Function to change the length of each original image to NUM_inputs Vector
H=relu(torch.matmul(X,W1)+b1) # Multiply first Enter times W1 On the plus b1
return torch.matmul(H,W2)+b2
# The output of the first layer is multiplied by the weight of the second layer, plus the deviation of the second layer
"""
5、 Define the loss function
"""
loss =torch.nn.CrossEntropyLoss()
"""
6、 Training models
The training process and softmax The training process for returning is the same
Let's go straight ⽤ d2lzh_pytorch In bag train_ch3 function ,
"""
num_epochs,lr=5,100.0 # Set the super parameter iteration period to 5, The learning rate is 100.0
d2l.train_ch3(net,train_iter,test_iter,loss,num_epochs,batch_size,params,lr)

Simple implementation of multi-layer perceptron
import torch
from torch import nn
from torch.nn import init
import numpy as np
import sys
sys.path.append("..")
import d2lzh_pytorch as d2l
#1、 Defining models
# and softmax Return to Wei ⼀ The difference is , We added more ⼀ A fully connected layer acts as a hidden layer
num_inputs, num_outputs, num_hiddens = 784, 10, 256
net = nn.Sequential(
d2l.FlattenLayer(),# Hidden layer
nn.Linear(num_inputs, num_hiddens), # Linear layer
nn.ReLU(),
nn.Linear(num_hiddens, num_outputs),
)
for params in net.parameters():
init.normal_(params, mean=0, std=0.01)
#2、 Read the data and train the model
batch_size = 256 # Batch size 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
loss = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.5)
# Because of this ⾥ send ⽤ Yes. PyTorch Of SGD⽽ No d2lzh_pytorch⾥⾯ Of sgd, So there is no such thing as the learning rate looks very ⼤ The problem.
num_epochs = 5
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs,batch_size, None, None, optimizer)
边栏推荐
- Introduction to SDI video data stream format (frequency, rate, YUV, EAV, SAV)
- Principle and example of OpenMP task
- 对企业来讲,MES设备管理究竟有何妙处?
- APS究竟是什么系统呢?看完文章你就知道了
- (p27-p32) callable object, callable object wrapper, callable object binder
- Vins technical route and code explanation
- OpenMP task 原理与实例
- 离散 第一章
- 计组第一章
- GTEST/GMOCK介绍与实战
猜你喜欢

How to write simple music program with MATLAB

Model Trick | CVPR 2022 Oral - Stochastic Backpropagation A Memory Efficient Strategy

数据库基础——规范化、关系模式

802.11 protocol: wireless LAN protocol

Explain the basic working principle of Ethernet

Vscode的Katex问题:ParseError: KaTeX Parse Error: Can‘t Use Function ‘$‘ In Math Mode At Position ...

Bean的作用域

ctfshow web3

Cookies and sessions

模型压缩 | TIP 2022 - 蒸馏位置自适应:Spot-adaptive Knowledge Distillation
随机推荐
MES系统是什么?MES系统的操作流程是怎样?
2.1 linked list - remove linked list elements (leetcode 203)
2.1 链表——移除链表元素(Leetcode 203)
Face recognition using BP neural network of NNET in R language
(P25-P26)基于非范围的for循环、基于范围的for循环需要注意的3个细节
目前MES应用很多,为什么APS排程系统很少,原因何在?
Cookies and sessions
Uni app screenshot with canvas and share friends
工厂的生产效益,MES系统如何提供?
Installation series of ROS system (II): ROS rosdep init/update error reporting solution
企业上MES系统的驱动力来自哪里?选型又该注意哪些问题?
JSP technology
Instructions spéciales pour l'utilisation du mode nat dans les machines virtuelles VM
C # push box
Ten important properties of determinant
MATLAB image processing - cosine noise removal in image (with code)
对企业来讲,MES设备管理究竟有何妙处?
(P13)final关键字的使用
vm虛擬機中使用NAT模式特別說明
HDLC protocol