当前位置：网站首页>Analysis of neural network

Analysis of neural network

2022-07-02 21:33:00 【caiggle】

Preface ： The study of neural networks dates back to 20 century 40 The age has already begun , Today, it has formed a huge system and has the characteristics of interdisciplinary .

One 、 Neuron model

Because neural network has the characteristics of interdisciplinary , So there are different definitions of neural network . We use the most extensive definition , by Kohonen On 1988 Put forward in ： neural network （neural networks） It is a widely parallel interconnected network composed of adaptive simple units , Its organization can simulate the interaction of biological nervous system with real world objects .

Let's take a look at the enduring M-P Neuron model .
Insert picture description here
For the sake of understanding , I will give a simple explanation to this model ：n Each neuron produces its own input x1、x2······xn, Each has a connection weight w1、w2······wn, Input value and received neuron threshold seta Compare , The output is generated by activating the function .

Common activation functions ：

1. Step function

Insert picture description here

x= linspace(0,1,101);
y = [zeros(1,50),ones(1,51)];
plot(x,y);

2.sigmoid function

Insert picture description here

x= [-100:0.1:100];
y=1./(1+exp(-x) );
plot(x,y);

3.tanh function

…… In addition, there are many activation functions, which are not listed here . It should be pointed out that the unit step function is our ideal activation function , But because it is not very smooth , The nature is not very good , In most cases, we use other functions instead of unit activation functions .

Two 、 Perceptron and multilayer feedforward neural network

In order to better understand perceptron and multilayer network , We can first establish logic and 、 or 、 Not 、 Exclusive or 、 Or not 、 And the concept of non .

And ： When all input conditions are met at the same time 1, Output 1; Input conditions as long as 0 The output 0.

or ： The input conditions are 1 The output 1; One 1 Only when there is no output 0.

Not ： The output result is negative to the input condition . namely 0 by 1,1 by 0.

Exclusive or ： For two input gates , The input conditions are the same as 0, Different for 1

Or not ： For two input gates , Neither of the two input conditions is 1 Time output 1; Otherwise output 0.

And non ： When the input conditions are 1 Instead of output 0.

1. perceptron

（1） Definition

The perceptron consists of two layers of neurons , That is, the input layer receives the external signal and passes it to the output layer .

（2） Work

We know Insert picture description here
Assume f Is the unit step function , By controlling the weight and threshold, the logical and or non operation can be realized . So it's easy to think of , How to determine the weight and threshold ？
The answer is “ Study ”. In fact, the threshold can be regarded as a fixed input -1 Of “ Dumb node ”, The corresponding connection weight is Wn+1, In this case “ Study ” It is equivalent to the learning of weight . follow “ Correct the mistake as soon as you know it ” Learning rules , namely ：
Insert picture description here

among η（0<η<1） Become learning rate , The perceptron adjusts according to the error degree of the estimated value .

2. Multilayer feedforward neural network

actually , The perceptron has only one layer of functional neurons , That is, only the output layer performs activation function processing , Limited ability , The ability to solve logical problems such as XOR is insufficient .

First, let's talk about linear separability and nonlinear separability ：
There is no strict definition here , Linear separability means that two kinds of patterns can be separated by a linear hyperplane , On the contrary, it is nonlinear separable .

For the XOR problem , Yes ：
Insert picture description here
Only two linear hyperplanes can be used to divide the two classes , such , Our original perceptron with two layers of neurons is about to expand the number of layers , Develop into multilayer neural network , The incoming layer is between the input layer and the output layer , Called hidden layer , Then there are single hidden layer feedforward networks and double hidden layer feedforward networks . If there is no ring or loop in the network topology , Then it is called multilayer feedforward neural network .

3、 ... and 、BP Algorithm

To develop Multilayer Neural Networks , There must be strong algorithm support , After all, the perceptron “ Correct the mistake as soon as you know it ” The rule of type learning is too simple . Let's take a look at the most successful neural network algorithm so far —— Error back propagation method .
For specific derivation , There are different ways , The starting point is also different , I use the weight from hidden layer to output layer to deduce 1h, It's very complicated , It's easy to get the symbols and subscripts wrong .
recommend ：[https://blog.csdn.net/u010858605/article/details/69857957]
About accumulation BP With the standard BP：
First of all, we should be clear about ,BP The goal of the algorithm is to minimize the cumulative error on the training set , But the standard BP The algorithm only updates the weight for a single training sample at a time , Updates appear more frequent , Processing is more complex , Time is also longer , And different updates may offset each other . The cumulative BP The algorithm aims at minimizing the cumulative error , Faster processing speed , But in some cases , It is difficult to reduce the cumulative error after it is reduced to a certain extent , Standard at this time BP The algorithm may get a better solution .

Four 、 For the solution of some problems

1. Over fitting problem

Because the function of neural network is too powerful , It often encounters fitting problems , It refers to the high fitting of the model to the training set , But the test error of the test set is increasing . So what's the solution ？

（1） Stop early

The data set is divided into training set and verification set , If the training set error decreases but the verification set error increases, then stop training , At the same time, the connection weight and threshold with the minimum verification set error are returned

（2） Regularization

Regularization methods are also different , But the basic ideas are consistent , That is to add a part to the error objective function to describe the complexity of the network , The error is determined by the weighted sum of empirical error and network complexity .
In fact, generally speaking, regularization is to make the parameter matrix sparse , Dilute or ignore the influence of certain characteristics , Therefore, the over fitting phenomenon is alleviated .

2. Jump out of local minimum

We want to find a suitable set of parameters to make the error objective function achieve the global minimum , This is a parameter optimization process . We know , The global minimum must be the local minimum , The local minimum is not necessarily the global minimum . Sometimes we may fall into local minima , This problem needs to be solved .