当前位置：网站首页>Neural network of PRML reading notes (1)

Neural network of PRML reading notes (1)

2022-07-05 12:39:00 【NLP journey】

Forward neural network

In the classification and regression problems , The final form can be expressed as
$y(x,w)=f(\sum_{j=1}^M w_i\phi_j(x))$ (1)
among , If it's a classification problem ,f(·) Is an activation function , If it's a question of return ,f(·) Is the regression function . The goal now is to adjust during training w, Adapt it to $\phi_j(x)$ Prepare training data .
Now let's look at a basic three-layer neural network model ： First we will have M About input variables $x_1,..x_d$ Linear combination function of ：
$a_j=\sum_{i=1}^Dw_{ji}^{(1)}x_i+w_{j0}^{(1)}$ (2)
among j=1,…,M, meanwhile w The superscript of (1) It represents the number of layers of neural network . Then there will be a nonlinear function h(·):
$z_j=h(a_j)$ (3)
(3) In the same form as (1) Exactly the same as . This is only the output of a unit in the hidden layer , These inputs will go through another linear combination ：
$a_k=\sum_{j=1}^M w_{kj}^{(2)}z_j+w_{k0}^{(2)}$ (4)
among k=1,…,k And start the number of all outputs . For the return question , $y_k=a_k$ . For the classification problem , $y_k=\sigma(a_k)$ (5)
among , $\sigma(a)=\frac{1}{1+exp(-1)}$ (6)

Writing all the above formulas together is ：
$y_k(x,w)=\sigma(\sum_{j=1}^Mw^{(2)}_{kj}h(\sum_{i=1}^Dw_{ji}^{(1)}x_i+w_{j0}^{(i)})+w_{k0}^{(2)})$
Therefore, the essence of neural network model is input variables $\{x_i\}$ In the parameter vector w Nonlinear transformation under .（7）
This network can be represented by the following figure ：

By introducing subscripts 0 And set $x_1$ You can put the formula 7 to ：
$y_k(x,w)=\sigma(\sum^M_{j=0}w_{kj}^{(2)}h(\sum_{i=0}^Dw_{ji}^{(1)}x_i))$ (8)

Because the combination of linear transformation is still linear transformation , If the activation functions of all hidden layers are linear transformations , Such a neural network can find an equivalent network without hidden layer . If the number of cells in the hidden layer is less than that of the input layer or the output layer , Information will be lost in the process of transmission .

chart 5.1 The network structure in is the most common neural network . There are different names in terms , It can be called three-layer neural network （ According to the number of layers of nodes ） Or single hidden layer neural network （ Number of hidden layers ）, Or two-layer neural network （ According to the number of layers of parameters ,PRML Recommended name ）.

One way to extend the neural network is to add hop layer connections , In other words, the output layer is not just connected as the hidden layer , It is also directly connected to the input layer . Although it is mentioned in the book that sigmoidal Function as the activation function of the hidden layer, the network can achieve the same effect as the one containing the thermocline connection by adjusting the weight of the input and output , It may be more effective to display the connection with jump layer on time .

On the premise of forward propagation （ That is, it will not contain closed rings , Output cannot be your own input ）, There can be more complex networks . The input of the next layer can be all the previous layers or subsets , Or other nodes in the current layer .. An example is as follows :

In this case z1,z2,z3 Is the hidden layer , meanwhile z1 again z2 and z3 The input of .

The approximation ability of neural networks has been widely studied and has become a universal approximator . For example, a two-layer network with linear output can approximate any continuous function with any accuracy in a compact input space , As long as the network has enough hidden layers .

Symmetry of weight space

A property of feedforward neural network is that for the same input and output , There are many possibilities for the weight parameters of the network .
First consider an image 5.1 Two layer neural network . Yes M Hidden layers , And adopt full connection （ Each lower layer is connected to all inputs of the previous layer ）, The activation function is tanh. For a unit of the hidden layer . If you change the sign of the input weight of all the units , Then the symbol after linear transformation will be changed , Again because tanh It's an odd function ,tanh(-a)=-tanh(a), Then the output of the unit will also change the symbol . By changing the symbol of the output weight of the unit, the value of the layer finally flowing into the output will not change . That is, by changing the symbols of the weights of the input to the hidden layer and the hidden layer to the output layer at the same time , The input and output of the whole network will not be affected . For having M Hidden layer of units , So there's a total of $2^M Groups produce the same input - Output the weight of the network （ Two for each unit ).$

Due to the symmetry of the middle hidden layer , You can swap the input and output parameters of any two hidden layer objects , It will not affect the input and output results of the whole network . therefore , All in all M! In different ways （ That is, in the middle M The number of permutations of hidden layers ）.

So for a middle one M For a two-layer neural network with hidden units , The total number of symmetric weights is $2^M*M!$ . And not only for the activation function tanh It was established. .

原网站

版权声明
本文为[NLP journey]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202140526387560.html

当前位置：网站首页>Neural network of PRML reading notes (1)

Neural network of PRML reading notes (1)

Forward neural network

Symmetry of weight space

边栏推荐

猜你喜欢

随机推荐