当前位置：网站首页>06. Neural network like

06. Neural network like

2022-07-25 14:39:00 【A rookie who can't burn to death is called Phoenix】

One 、 Single layer perceptron model

（ One ） Model diagram

（ Two ） Purpose ： Build universal functions

Insert picture description here

The following activation functions are used ： Hidden layer to output layer , When output

1、 What to use activation function

1、 Easy derivation , Carry out back propagation calculation ;
2、 Output according to human needs Y Map to a space ;
3、 Activation functions introduce nonlinear factors into neurons , So that the neural network can arbitrarily approximate any nonlinear function , So the neural network can be applied to many nonlinear models .

2、 Commonly used activation functions and python Realization

1、 Step activation function ：： With 0 Activate for valve , Otherwise closed
The function diagram ：
Insert picture description here
Code practice ：

import numpy  as np
def threshold_function (x):
    y=x>0# If x It's an array , The result after execution is greater than 0 Elements correspond to y Array elements are true And vice versa false. Such as x=[-1,1,2], be y=[false,true,ture].
    return y.astype(int)#astype Transformation of data types , hold y Convert to the type in brackets , hold y=[false,true,ture] Convert to [0,1,1]
x=np.array([-1,1,2])
threshold_function(x)

2、sigmoid Activation function : Output the value to （0~1）, Used to sort probability size, etc
The function diagram ：
Insert picture description here
Advantages and disadvantages ：

Code practice ：

def sigmoid (x):
    return 1/(1+np.exp(-x))#exp, yes np Functions inside , Can't be without ‘np.’ Use it directly 'exp(-x)'
x=np.array([-1,1,2])
sigmoid(x)# After the change ‘array([0.26894142, 0.73105858, 0.88079708])’

3、Tanh Activation function ：： Map the output value to （-1~1）

The function diagram ：

Insert picture description here
Advantages and disadvantages ：

Code practice ：

import numpy  as np 
def tanh (x):
    return (1- np.exp(-2*x))/(1+np.exp(-2*x))#‘2*x’ Can't make it ‘2x’
x=np.array([-1,1,2])
tanh(x)
# Output ：array([-0.76159416, 0.76159416, 0.96402758])

4、ReLu Activation function ： Do not reach the valve ‘0’ Time is ‘0’, After activation , Linear increase

The function diagram ：
Insert picture description here
Advantages and disadvantages :

Code practice ：

import numpy  as np 
def relu(x):
    return np.maximum(0,x)#0 It used to be 0, Greater than 0 Linear increase 
x=np.array([-1,1,2])
relu(x)
# Value after change ：array([0, 1, 2])

( 3、 ... and ) Neural network propagation process ： above all

1、 Model diagram

Suppose the offset is 1

2、 Add offset

Insert picture description here

3、 Communication process ：

First step ： Positive communication ： Get the difference between the predicted value and the real value
The second step ： Back propagation ： By gradient descent , Update parameters ： The weight weight And deviation bias
The third step ： Forward propagation again Use the updated parameters , Continue Prequel , Get the difference between the predicted value and the real value again
Step four ： Back propagation again By gradient descent , Update parameters ： The weight weight And deviation bias
**········：** By iterating back and forth repeatedly, the predicted value and the real value are minimized , Also get the final parameter value .
Use the output softmax Functions are standardized , Add to one .

4、 The basic knowledge involved in the dissemination process

1、 error （ Loss ） function ： Measure the difference between the predicted value and the real value
The first one is ： Mean square deviation function ： Insert picture description here
The second kind ： Cross entropy loss function ：

2、 Output layer activation function
softmax: Standardize the output .
Use location ： One to one mapping of predicted value to output value .

Mapping example diagram ：
Insert picture description here

3、 Gradient descent method ： Make an error （ Loss ） Function minimization algorithm

explain ： Given a set of function parameters , The gradient decreases from a set of initial parameter values （ Set casually at the beginning ） Start , The iteration moves to a set of parameter values that minimize the loss function . This iterative minimization is achieved using calculus , Take a gradient change in the negative direction of the gradient . As the model iterates , The loss function gradually converges to the minimum .

Calculate the partial differential
** explain ：** Because it is a set of parameters, which are composed of weight and deviation , So it's partial differential .
** Purpose ：** Find the parameter combination with the fastest reduction of function value
Add ：
The slope of a point （ Partial differential at some point ） Positive of 、 A negative value indicates which direction to adjust , Big 、 A small value indicates the adjustment range ;
The slope of a point （ Partial differential at some point ） The more novel is clear, the closer it is to the ideal point , The bigger the adjustment, the bigger .
Insert picture description here
Central difference ： An alternative way of calculating partial differential

Purpose ： solve H The influence of value on the result , It can make the partial differential closer to the demand （ It's just that the calculation is more accurate ）.
A comparison of the two ：

def func (x):
    return x**2
def dfunc  (f,x):
    h=1e-4# Define an extremely small moment 
    return (f(x+h)-f(x-h))/(2*h) 
dfunc(func,3)
# Partial differential method ：6.000100000012054
# The difference method ：6.000000000012662

** 4、 The problem of learning rate **
Insert picture description here
explain ： Multiply by a learning rate , Make it reach the minimum value slowly , It must be known from experiments , It is usually set to ：0.01 or 0.1 etc.
Be careful ： Choosing the right learning rate is one of the key factors
annotation ： It's error （ Loss ） Partial differential of function to parameter

（ Four ） shortcoming ： Cannot solve the XOR gate problem

Insert picture description here resolvent ： By combining multi-layer perceptron , To carry out XOR Problem input / output control .

5、 ... and 、 Add

1、W It means ：W The row represents the number of hidden layer nodes （ Number of outputs ）, The number of columns represents X dimension （ Enter the number ）
Insert picture description here
2、 The most basic and commonly used model
Every neuron has two basic operations ：
1、 Aggregate input ;2、 Activate aggregation .

Get the predicted values of all training samples ：
Scan the training set horizontally from left to right
Sweep each node vertically from top to bottom