当前位置:网站首页>Self learning neural network series - 7 feedforward neural network pre knowledge
Self learning neural network series - 7 feedforward neural network pre knowledge
2022-06-26 09:10:00 【ML_ python_ get√】
7 Feedforward neural network pre knowledge
One Perceptron algorithm
1 Model form
z = ∑ i = 0 D w d x d + b z = \sum_{i=0}^D w_dx_d+b z=i=0∑Dwdxd+b
2 Linear classifier
- Accept multiple input signals , Output a signal
- Neuron : Weighting the input signal , If the weighted number meets a certain condition, enter 1, Otherwise output 0
- The weight represents the importance of the signal
3 Existing problems
And gate : Only two inputs are 1 When the output 1, Other situation output 0
NAND gate : Invert the and gate output , Only two inputs are 1 When the output 0, Otherwise output 1
Or gate : As long as one input is 1, Then output 1, Only all inputs are 0, Just output 0
Exclusive OR gate : Only one input is 1 when , Will enter 1, If the two inputs are 1 when , Output 0, Pictured 1 Shown
Perceptron algorithms cannot handle XOR gates
The idea of perceptron algorithm : As long as the parameters of the sensor are adjusted, the switch between different doors can be realized ; Parameter adjustment is left to the computer , Let the computer decide what kind of door .

4 python Realization
(1) And gate
def AND(x1,x2):
''' Implementation of and gate '''
w1,w2,theta = 0.5,0.5,0.7
tmp = x1*w1+x2*w2
if tmp<=theta:
return 0
elif tmp>theta:
return 1
# Test functions
print(AND(1,1)) # 1
print(AND(0,0)) # 0
print(AND(1,0)) # 0
print(AND(0,1)) # 0
# Use offset and numpy Realization
def AND(x1,x2):
import numpy as np
x = np.array([x1,x2])
w = np.array([0.5,0.5])
b = -0.7 # threshold , Adjust how easily neurons are activated
tmp = np.sum(w*x) +b
if tmp<=0:
return 0
elif tmp>0:
return 1
# test Just like a normal implementation
print(AND(1,1)) # 1
print(AND(0,0)) # 0
print(AND(1,0)) # 1
print(AND(0,1)) # 1
(2) NAND gate
- The output is just the opposite , The weights and offsets are opposite to each other
def NAND(x1,x2):
''' NAND gate '''
import numpy as np
x = np.array([x1,x2])
w = np.array([-0.5,-0.5])
b = 0.7
tmp = np.sum(w*x)+b
if tmp <=0:
return 0
elif tmp>0:
return 1
# test
print(NAND(1,1)) # 0
print(NAND(1,0)) # 1
print(NAND(0,1)) # 1
print(NAND(0,0)) # 1
(3) Or gate
- The absolute value of the offset is less than 0.5 that will do Easier to activate
def OR(x1,x2):
import numpy as np
x = np.array([x1,x2])
w = np.array([0.5,0.5]) # As long as not both inputs go 0, It outputs 1
b = -0.2
tmp = np.sum(w*x)+b
if tmp<=0:
return 0
else:
return 1
OR(1,1)
OR(0,1)
OR(1,0)
OR(1,1)
5 Multi layer perceptron to solve XOR problem
- Exclusive OR gate : Cannot be separated by a straight line
- Introduce nonlinearity : Overlay layer perceptron
- Through the NAND gate, we get S1 Or door access S2 You can see the result of the XOR gate that can be reached through the and gate
- Multilayer perceptron : There are multiple linear classifiers , The nonlinear fitting can be realized through and gate
def XOR(x1,x2):
s1 = NAND(x1,x2) # NAND gate
s2 = OR(x1,x2) # Or gate
y = AND(s1,s2) # It is equivalent to combining two
return y
# test
XOR(1,1)
XOR(0,0)
XOR(1,0)
XOR(0,1)
Two Neural network structure
The basic idea : In the last section , Multi-layer perceptron is a multi-linear function that realizes nonlinear classification through logical operators , It is natural to associate the nonlinear transformation of linear function to solve the nonlinear separable classification problem . This kind of nonlinear transformation is an activation function in neural networks .
neural network : The multi-layer perceptron model with activation function is used to learn the statistical model of nonlinear feature expression . The common fully connected neural network structure is shown in the figure 2 Shown :

1 Common activation functions
- Any curve can be approximated by an activation function : Polynomial function can fit any point in space perfectly
- Any curve is the sum of some activation functions , Similar to the idea of spline estimation .
- The neural network is divided into several layers : Each layer uses the same activation function , These activation functions only differ in weight and bias
- Two ReLU Function to construct a step function ( Value 0,1) perhaps sigmoid function
- So in the same case ReLU The activation function of ( Neuron ) It needs to be doubled
(1)sigmoid Activation function
- Defined in machine learning sigmoid The activation function is Logstic The distribution of the CDF, Form the following :
σ ( x ) = 1 1 + e x p ( − x ) \sigma(x) ={1\over1+exp(-x)} σ(x)=1+exp(−x)1
Logical distribution belongs to exponential distribution family
Logstic Distribution is often used in periodic analysis , For example, the economic depression 、 recovery 、 prosperity 、 decline , At first the economy grew slowly , The economy began to grow rapidly after recovery , After the boom, the economy began to stagnate , Growth is slowing , Finally, it even began to decline , Continue into the depression , More in line with logistic Distribution .
The distribution function has 0-1 Characteristics of , So its output can be regarded as a probability distribution , Used for classification .
Intermediate activation value of distribution function , The characteristic of being suppressed on both sides , It conforms to the characteristics of neurons
The gradient vanishing problem
- The derivatives at both ends are close to 0
- The gradient is less than 1, When the chain rule is conducted too long , The gradient vanishing problem
(2)Tanh Activation function
- Tanh Activation function form
t a n h ( x ) = e x p ( x ) − e x p ( − x ) e x p ( x ) + e x p ( − x ) tanh(x) = {exp(x)-exp(-x)\over exp(x)+exp(-x)} tanh(x)=exp(x)+exp(−x)exp(x)−exp(−x)
- Tanh Can be seen as sigmoid Deformation of the activation function , Both belong to the family of exponential functions
t a n h ( x ) = 2 σ ( 2 x ) − 1 tanh(x) = 2\sigma(2x)-1 tanh(x)=2σ(2x)−1
- tanh(x) range by (-1,1) In line with the characteristics of centralization ,sigmoid(x) The value range is (0,1) Output constant greater than 0
(3)Relu Activation function
- Relu Activation function is the most commonly used activation function in neural networks , Form the following :
R e l u ( x ) = { x , x > 0 0 , x < = 0 Relu(x)=\begin{cases} x ,& x>0 \\ 0,& x<=0 \\ \end{cases} Relu(x)={ x,0,x>0x<=0
- Unilateral inhibition : Left saturation , The axis is infinitely far away , The phenomenon that the value of a function does not change significantly is called saturation ,sigmoid Functions and Tanh A function is a saturated function at both ends ,Relu The activation function is left saturated ;
- Wide boundaries of excitement : The activation area is wide , Positive input can be activated
- Ease the problem of gradient disappearance : Derivative is 1, To some extent, the gradient vanishing problem can be alleviated
- Relu The question of death : When the input is an outlier ,Target There will be a big deviation in our prediction , So back propagation ( The reverse is the error ), Will cause the offset b The update of is offset ( Offset offset problem ), At the same time, it may make b Negative , Offset is negative , In this way, no amount of samples will make the calculated hidden layer negative , after relu When the function is activated , The gradient of 0, Parameters are no longer updated , This phenomenon is called Death Relu problem .
(4)Leaky Relu
- To improve Relu function , Avoid death Relu problem ,Leaky Relu Activate function reenter x When it is negative , Introduce a small gradient , as follows :
L e a k y R e l u ( x ) = { x i f x > 0 γ i x i f x < = 0 Leaky Relu(x)=\begin {cases} x & if \space x>0 \\ \gamma_i x & if \space x<=0\\ \end{cases} LeakyRelu(x)={ xγixif x>0if x<=0
- among γ i \gamma_i γi It can be with estimated parameters , It can also be a constant
- ELU Activation function : Make γ i x = γ i ( e x p ( x ) − 1 ) \gamma_ix=\gamma_i(exp(x)-1) γix=γi(exp(x)−1)
(5)Softplus Activation function
- Relu The smooth version of the function , Form the following :
S o f t p l u s ( x ) = l o g ( 1 + e x p ( x ) ) Softplus(x) = log(1+exp(x)) Softplus(x)=log(1+exp(x))
- Derivative is sigmoid Activation function , The gradient vanishing problem has no sparse activation
- Unilateral inhibition 、 Wide boundaries of excitement
(6) Other activation functions
Swish function : s w i s h ( x ) = x σ ( β x ) swish(x) = x \sigma(\beta x) swish(x)=xσ(βx)
- σ Function as a gating unit , control x The output size of
- Be situated between Relu And linear function
Gelu function : G e l u ( x ) = x P ( X < = x ) Gelu(x) = xP(X<=x) Gelu(x)=xP(X<=x)
- Gaussian function is used as gating unit , control x Output
Maxout unit
- Piecewise linear functions
- Use all the outputs of the upper layer neurons instead of one of them , Get multiple parameter vectors
- The output takes the maximum value of multiple outputs after linear transformation
2 Network structure
(1) Feedforward neural networks
- It can only spread in one direction , There's no reverse flow of information
- All connected neural networks

- Convolutional neural networks

- Different from full connection : Partial connections between neurons of different layers 、 Share weight ( Convolution kernel ) Reduce the number of parameter estimates

(2) Cyclic neural network
- It can not only receive information from other neurons , You can also accept your own historical information
- The oblivion of past information + Now the information is updated + Future information

(3) Figure neural network
- Modeling graph structure data
- Nodes and edges are placed in vector space
- Establish neural networks for nodes and edges respectively

Reference material :
1. Qiu Xipeng :《 Neural networks and deep learning 》
2. Li Mu :《 Hands-on deep learning 》
边栏推荐
- [300+ continuous sharing of selected interview questions from large manufacturers] column on interview questions of big data operation and maintenance (I)
- 百度小程序富文本解析工具bdParse
- 20220623 getting started with Adobe Illustrator
- 隐藏式列表菜单以及窗口转换在Selenium 中的应用
- 【IVI】15.1.2 系统稳定性优化篇(LMKD Ⅱ)PSI 压力失速信息
- Efficiency thesis Reading 1
- phpcms小程序插件api接口升级到4.3(新增批量获取接口、搜索接口等)
- Programming training 7- date conversion problem
- isinstance()函数用法
- Vipshop work practice: Jason's deserialization application
猜你喜欢

什么是乐观锁,什么是悲观锁

Autoregressive model of Lantern Festival

20220623 Adobe Illustrator入门

Phpcms applet plug-in tutorial website officially launched

uniapp用uParse实现解析后台的富文本编辑器的内容及修改uParse样式

浅谈一下Type-C接口发展历程

实践是成为网工最快的方法,网络工程师实战项目整理

Slider verification - personal test (JD)

Yolov5 advanced zero environment rapid creation and testing

深度学习论文阅读目标检测篇(七)中文版:YOLOv4《Optimal Speed and Accuracy of Object Detection》
随机推荐
2021 software university ranking crawler program
What is optimistic lock and what is pessimistic lock
隐藏式列表菜单以及窗口转换在Selenium 中的应用
Course paper: Copula modeling code of portfolio risk VaR
Yolov5进阶之一摄像头实时采集识别
PD快充磁吸移動電源方案
Basic concept and advanced level of behavior tree
Application of hidden list menu and window transformation in selenium
SRv6----IS-IS扩展
Docker install redis
唯品会工作实践 : Json的deserialization应用
Graduation thesis management system based on SSM
20220213 Cointegration
Pytorch neural network
Sqoop merge usage
Unity 接入图灵机器人
[program compilation and pretreatment]
Yolov5 advanced level 2 installation of labelimg
如何编译构建
深度学习论文阅读目标检测篇(七)中文版:YOLOv4《Optimal Speed and Accuracy of Object Detection》