当前位置：网站首页>Hands on deep learning -- activation function and code implementation of multi-layer perceptron

Hands on deep learning -- activation function and code implementation of multi-layer perceptron

2022-06-12 08:13:00 【Orange acridine 21】

perceptron

A given input x（ vector ）, The weight w（ vector ）, And offset b（ Scalar ）, Perceptron output ：

The perceptron cannot fit XOR function , He can only produce linear split surfaces .

Perceptron is a binary model , Is the earliest AI One of the models .

The solution algorithm of the perceptron is equivalent to using a batch size of 1 The gradient of .

Multilayer perceptron

1、 Study XOR

2、 Single hidden layer - Single category

Why do you need a nonlinear activation function ？

If the activation function is not added , Namely n Three full connection layers are superimposed together , The output is also the simplest linear model .

3、 Activation function

3.1 sigmoid function

He means for x Speaking of , Project him onto （0,1） The open range of , Is that if x Greater than 0, Namely 1; If x Less than 0, Namely 0. It is a jump function , Stiff .

sigmoid Function is to project the input to （0,1） The open range of , It is a curvilinear function , Soft and smooth .

# sigmoid function 

import torch
import numpy as np
import  matplotlib.pylab as plt
import sys
sys.path.append("..")
import d2lzh_pytorch as d2l

def xypolt(x_vals,y_vals,name):
    d2l.set_figsize(figsize=(5,2.5))
    d2l.plt.plot(x_vals.detach().numpy(),y_vals.detach().numpy())
    d2l.plt.xlabel('x')
    d2l.plt.ylabel(name+'(x)')
x=torch.arange(-8.0,8.0,0.1,requires_grad=True)
y=x.sigmoid()
xypolt(x,y,'sigmoid')
plt.show()

"""
 Derivation implementation 
"""
x.grad.zero_() # Gradient clear 
y.sum().backward() # To find the derivative 
xypolt(x,x.grad,'grad of sigmoid')
plt.show()

3.2 tanh function

tanh（ Hyperbolic tangent ） Function to transform the value of an element to -1 and 1 Between ：

import torch
import numpy as np
import  matplotlib.pylab as plt
import sys
sys.path.append("..")
import d2lzh_pytorch as d2l

def xypolt(x_vals,y_vals,name):
    d2l.set_figsize(figsize=(5,2.5))
    d2l.plt.plot(x_vals.detach().numpy(),y_vals.detach().numpy())
    d2l.plt.xlabel('x')
    d2l.plt.ylabel(name+'(x)')
#tanh function 
x=torch.arange(-8.0,8.0,0.1,requires_grad=True)
y=x.tann()
xypolt(x,y,'tanh')
plt.show()

"""
 Derivation implementation 
"""
x.grad.zero_() # Gradient clear 
y.sum().backward() # To find the derivative 
xypolt(x,x.grad,'grad of tanh')
plt.show()

3.3 ReLU function

ReLU（rectified linear unit） Function provides ⼀ It's a very simple one ⾮ linear transformation . Given element , This function is defined as ：

ReUL(x)=max(x,0).

import torch
import numpy as np
import  matplotlib.pylab as plt
import sys
sys.path.append("..")
import d2lzh_pytorch as d2l

def xypolt(x_vals,y_vals,name):
    d2l.set_figsize(figsize=(5,2.5))
    d2l.plt.plot(x_vals.detach().numpy(),y_vals.detach().numpy())
    d2l.plt.xlabel('x')
    d2l.plt.ylabel(name+'(x)')
#1、ReLU function 
"""
 Next use NDArray Provided relu Function to draw ReLU function 
"""
x=torch.arange(-8.0,8.0,0.1,requires_grad=True)
y=x.relu()
xypolt(x,y,'relu')
plt.show()

"""
 obviously , When the input is negative ,ReLU The derivative of the function is 0; When the input is a positive number ,ReLU The derivative of the function is 1. Although the input is 0 when ReLU Functions are not differentiable ,
 But we can take the derivative here as 0, Now draw ReLU Derivative of a function .
"""
x.grad.zero_() # Gradient clear 
y.sum().backward()
xypolt(x,x.grad,'grad of relu')
plt.show()

4、 Multilayer perceptron

Multi layer perceptron contains ⾄ Less ⼀ A hidden layer of nerves composed of fully connected layers ⽹ Collateral , And the output of each hidden layer passes through the activation function Into the ⾏ Transformation . The number of layers of the multi-layer perceptron and the number of hidden units in each hidden layer are all super parameters .

Take a single hidden layer as an example and follow ⽤ Before this section Defined symbols , Multi layer perceptron press the following ⽅ Formula calculation output ：

among

Is the activation function . In the classification problem , We can output do softmax operation , And make ⽤softmax Crossing in regression Entropy loss function . In the regression problem , We set the number of outputs of the output layer to 1, And output Provide directly to linear regression to make ⽤ The level of ⽅ Loss function .

5、 summary

Multilayer perceptron uses hidden layer and activation function to obtain nonlinear model .
The usual activation function is Sigmoid、Tanh、ReLU.
Use Softmax To handle multiple classifications .
The super parameters are the number of hidden layers and the size of each hidden layer .

原网站

版权声明
本文为[Orange acridine 21]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/03/202203010550044019.html

当前位置：网站首页>Hands on deep learning -- activation function and code implementation of multi-layer perceptron

Hands on deep learning -- activation function and code implementation of multi-layer perceptron

边栏推荐

猜你喜欢

随机推荐