当前位置:网站首页>Cyclic neural network
Cyclic neural network
2022-07-28 06:14:00 【Jiyu Wangchuan】
Cyclic neural network
One 、 Cyclic neural network
Cyclic neural network (recurrent neural network,RNN) It is a kind of neural network used to process sequence data .
1.1 CNN And RNN Simple comparison
CNN: With the help of convolution kernel (kernel) After feature extraction , Into the following network ( Such as fully connected network Dense)
To classify 、 Target detection and other operations .CNN With the help of convolution kernel from Spatial dimension Extract information , Convolution kernel parameter space sharing .
RNN: With the help of circulating nuclear (cell) After feature extraction , Into the following network ( Such as fully connected network Dense)
Make prediction and other operations .RNN With the help of cyclic nuclei from Time dimension Extract information , Cyclic kernel parameter time sharing .
1.2 Circulating nucleus
Circulating nucleus Have memory , Share through parameters at different times , Realize the information extraction of time series
take . Each cyclic core has multiple memories , Corresponding to several small cylinders in the figure below .
The memory body stores the state information of each moment h 𝑡 ℎ_𝑡 ht,
h 𝑡 = t a n h ( 𝑥 𝑡 𝑤 𝑥 h + h 𝑡 − 1 w h h + 𝑏 h ) h𝑡 = tanh( 𝑥_𝑡𝑤_{𝑥ℎ}+ℎ_{𝑡−1}w_{ℎℎ} + 𝑏ℎ) ht=tanh(xtwxh+ht−1whh+bh) among , 𝑤 𝑥 h 、 w h h 𝑤_{𝑥ℎ}、w_{ℎℎ} wxh、whh For the weight matrix , 𝑏 h 𝑏ℎ bh For bias , 𝑥 𝑡 𝑥_𝑡 xt Input characteristics for the current time , h 𝑡 − 1 ℎ_{𝑡−1} ht−1 Status information stored for the last time in memory , t a n h tanh tanh Is the activation function
The output characteristics of the cyclic kernel at the current time 𝑦 𝑡 = s o f t m a x ( h 𝑡 𝑤 h 𝑦 + 𝑏 y ) 𝑦_𝑡=softmax(ℎ_𝑡𝑤_{ℎ𝑦}+𝑏y ) yt=softmax(htwhy+by) among w w y w_{wy} wwy For the weight matrix 、 𝑏 y 𝑏y by For bias 、 s o f t m a x softmax softmax Is the activation function , In fact, it is equivalent to a full connection layer . We can set the number of memory to change the memory capacity , When the number of memory is specified 、 Input 𝑥 𝑡 𝑥_𝑡 xt Output y t y_t yt Dimensions are specified , The dimensions of these parameters to be trained are also limited .
In forward propagation , Memorize the state information stored in the body h𝑡𝑡 Refreshed at every moment , And the three parameter matrix w x h w_{xh} wxh、 w h h w_{hh} whh、 w h y w_{hy} why And two offset terms b h bh bh、 b y by by It is fixed from beginning to end . In back propagation , Three parameter matrices and two bias terms are updated by gradient descent method .
1.3 The cycle core expands in time steps
Expand the cycle core in time steps , Is to expand the cycle core in the direction of the time axis , You can get the form shown in the figure below . Memory status information at each time h t h_t ht Has been refreshed , The parameter matrix and two bias terms around the memory are fixed , What we train and optimize is these parameter matrices . After training , Use the best parameter matrix to perform forward propagation , Then output the prediction results .
In fact, this is consistent with our human prediction : The memory in our brain is updated every moment according to the current input ; The current predictive reasoning is based on our previous knowledge accumulation “ Parameter matrix ” Reasoning and judgment .
It can be seen that , Cyclic neural network is to extract the time feature with the help of cyclic kernel
Information is sent to the fully connected network , So as to realize the prediction of continuous data .
1.4 Cyclic computing layer : Grow in the direction of output
stay RNN in , Each cycle core forms a cycle computing layer , The number of layers of cyclic computing layer increases in the direction of output . As shown in the figure below , The network on the left has a cyclic core , It forms a layer of cyclic calculation ; The network in the figure has two cyclic cores , It forms two layers of cyclic calculation ; The network on the right has three cyclic cores , It forms a three-layer cyclic calculation layer . among , The number of memory in each cyclic core in the three networks can be arbitrarily specified according to our needs .
1.5 RNN Training
obtain RNN After the forward propagation of the results , Similar to other neural networks , We will define the loss function , The back propagation gradient descent algorithm is used to train the model .
RNN The only difference is : Because of it, the node at each time may have an output , therefore RNN The total loss is all times ( Or part of the time ) Losses and .
Two 、Tensorflow Describe the cyclic computing layer
tf.keras.layers.SimpleRNN( Number of neurons ,activation=‘ Activation function ’,return_sequences= Whether to output every time ℎ𝑡 To the next floor )
(1) Number of neurons : That is, the number of memory in the cyclic core
(2) return_sequences: In the output sequence , Returns the output value of the last time step ℎ𝑡 Or return the output of all time steps .False Return to the last moment ,True Return to all times . The next level is still RNN layer , Usually it is True, On the contrary, if it is followed by Dense layer , Usually it is Fasle.
(3) Input dimensions : Three dimensional tensor ( Enter the number of samples , Cycle kernel time expansion steps , Enter the number of features in each time step ).
As shown in the figure above , The picture on the left shows a total of RNN Two sets of data in layer , After a time step, each group of data will get the output result , Enter three values in each time step , Then the data dimension of the input loop layer is chart 1.2.6 RNN Layer input dimension [2, 1, 3]
There is only one set of data input in the figure on the right , It is sent to the circulation layer in four time steps , Enter two values per time step , Then the data dimension of the input loop layer is [1,4,2].
(4) Output dimension :
- return_sequenc=True, Three dimensional tensor ( Enter the number of samples , Cycle kernel time expansion steps , Number of neurons in this layer )
- return_sequenc=False, Two dimensional tensor ( Enter the number of samples , Number of neurons in this layer )
(5) activation:‘ Activation function ’( Do not write default use tanh)
3、 ... and 、 Example of cyclic calculation process
RNN The most typical application is to use historical data to predict what will happen at the next moment , That is, make predictions according to the historical laws seen before . Take a simple example of letter prediction to experience the calculation process of circular network : Enter one letter to predict the next letter —— Input a Predict b、 Input b Predict c、 Input c Predict d、 Input d Predict e、 Input e Predict a. Computers don't recognize letters , Only numbers can be processed . So we need to code the letters . It is assumed here that the single heat coding is used ( In practice, other coding methods can be used ), The coding results are shown in the figure below .
| Hot coding alone | Letter |
|---|---|
| 10000 | a |
| 01000 | b |
| 00100 | c |
| 00010 | d |
| 00001 | e |
Suppose you use a layer RNN The Internet , Select the number of memory 3, The network of letter prediction is shown in the figure below .
Suppose you enter a letter b, The input 𝑥 𝑡 𝑥_𝑡 xt by [ 0 , 1 , 0 , 0 , 0 ] [0,1,0,0,0] [0,1,0,0,0], At this time, the memory status information of the last time h 𝑡 ℎ_𝑡 ht by 0. From the above theoretical knowledge, it is not difficult to get : h 𝑡 = t a n h ( 𝑥 𝑡 𝑤 𝑥 h + h t − 1 w h h + 𝑏 ) = t a n h ( [ − 2.3 , 0.8 , 1.1 ] + 0 + [ 0.5 , 0.3 , − 0.2 ] ) = t a n h [ − 1.8 , 1.1 , 0.9 ] = [ − 0.9 , 0.8 , 0.7 ] h_𝑡=tanh(𝑥_𝑡𝑤_{𝑥ℎ}+ℎ_{t-1}w_{ℎℎ}+𝑏)\\=tanh([−2.3,0.8,1.1 ]+ 0 + [ 0.5,0.3,−0.2])\\= tanh[−1.8,1.1,0.9 ] = [−0.9 ,0.8,0.7] ht=tanh(xtwxh+ht−1whh+b)=tanh([−2.3,0.8,1.1]+0+[0.5,0.3,−0.2])=tanh[−1.8,1.1,0.9]=[−0.9,0.8,0.7] This process can be understood as that the memory in the brain is updated due to the current input .
Output y 𝑡 y_𝑡 yt It is the process of identifying and predicting the extracted time information through full connection , It is the output layer of the whole network .
y 𝑡 = s o f t m a x ( h 𝑡 𝑤 h 𝑦 + 𝑏 y ) = s o f t m a x ( [ − 0.7 , − 0.6 , 2.9 , 0.7 , − 0.8 ] + [ 0.0 , 0.1 , 0.4 , − 0.7 , 0.1 ] ) = s o f t m a x ( [ − 0.7 − 0.53.30.0 − 0.7 ] ) = [ 0.02 , 0.02 , 0.91 , 0.03 , 0.02 ] y_𝑡=softmax(ℎ_𝑡𝑤_{ℎ𝑦}+𝑏y )= softmax([−0.7,−0.6,2.9,0.7,−0.8] + [ 0.0,0.1,0.4,−0.7,0.1])\\= softmax([−0.7 −0.5 3.3 0.0 −0.7])= [0.02,0.02,0.91, 0.03,0.02 ] yt=softmax(htwhy+by)=softmax([−0.7,−0.6,2.9,0.7,−0.8]+[0.0,0.1,0.4,−0.7,0.1])=softmax([−0.7−0.53.30.0−0.7])=[0.02,0.02,0.91,0.03,0.02] It can be seen that the model thinks there is 91% The possibility of outputting letters c , So the circular network outputs the prediction results c.
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, SimpleRNN
import matplotlib.pyplot as plt
import os
input_word = "abcde"
w_to_id = {
'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4} # Words are mapped to numbers id The dictionary of
id_to_onehot = {
0: [1., 0., 0., 0., 0.], 1: [0., 1., 0., 0., 0.], 2: [0., 0., 1., 0., 0.], 3: [0., 0., 0., 1., 0.],
4: [0., 0., 0., 0., 1.]} # id Encoded as one-hot
x_train = [id_to_onehot[w_to_id['a']], id_to_onehot[w_to_id['b']], id_to_onehot[w_to_id['c']],
id_to_onehot[w_to_id['d']], id_to_onehot[w_to_id['e']]]
y_train = [w_to_id['b'], w_to_id['c'], w_to_id['d'], w_to_id['e'], w_to_id['a']]
np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
tf.random.set_seed(7)
# send x_train accord with SimpleRNN Input requirements :[ Number of samples sent , Cycle kernel time expansion steps , Enter the number of features in each time step ].
# Here, the whole data set is sent to , The number of samples sent is len(x_train); Input 1 A letter gives the result , The number of expansion steps of cycle kernel time is 1; Expressed as a single hot code with 5 Input features , The number of input features in each time step is 5
x_train = np.reshape(x_train, (len(x_train), 1, 5))
y_train = np.array(y_train)
model = tf.keras.Sequential([
SimpleRNN(3),
Dense(5, activation='softmax')
])
model.compile(optimizer=tf.keras.optimizers.Adam(0.01),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
metrics=['sparse_categorical_accuracy'])
checkpoint_save_path = "./checkpoint/rnn_onehot_1pre1.ckpt"
if os.path.exists(checkpoint_save_path + '.index'):
print('-------------load the model-----------------')
model.load_weights(checkpoint_save_path)
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path,
save_weights_only=True,
save_best_only=True,
monitor='loss') # because fit No test set is given , Test set accuracy is not calculated , according to loss, Save the best model
history = model.fit(x_train, y_train, batch_size=32, epochs=100, callbacks=[cp_callback])
model.summary()
# print(model.trainable_variables)
file = open('./weights.txt', 'w') # Parameter extraction
for v in model.trainable_variables:
file.write(str(v.name) + '\n')
file.write(str(v.shape) + '\n')
file.write(str(v.numpy()) + '\n')
file.close()
############################################### show ###############################################
# Displays the of the training set and the validation set acc and loss curve
acc = history.history['sparse_categorical_accuracy']
loss = history.history['loss']
plt.subplot(1, 2, 1)
plt.plot(acc, label='Training Accuracy')
plt.title('Training Accuracy')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(loss, label='Training Loss')
plt.title('Training Loss')
plt.legend()
plt.show()
############### predict #############
preNum = int(input("input the number of test alphabet:"))
for i in range(preNum):
alphabet1 = input("input test alphabet:")
alphabet = [id_to_onehot[w_to_id[alphabet1]]]
# send alphabet accord with SimpleRNN Input requirements :[ Number of samples sent , Cycle kernel time expansion steps , Enter the number of features in each time step ]. Here the verification effect is sent to 1 Samples , The number of samples sent is 1; Input 1 A letter gives the result , So the number of cyclic kernel time expansion steps is 1; Expressed as a single hot code with 5 Input features , The number of input features in each time step is 5
alphabet = np.reshape(alphabet, (1, 1, 5))
result = model.predict([alphabet])
pred = tf.argmax(result, axis=1)
pred = int(pred)
tf.print(alphabet1 + '->' + input_word[pred])
The operation results are as follows :
Epoch 100/100
5/5 [==============================] - 0s 34ms/sample - loss: 0.0400 - sparse_categorical_accuracy: 1.0000
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
simple_rnn (SimpleRNN) multiple 27
_________________________________________________________________
dense (Dense) multiple 20
=================================================================
Total params: 47
Trainable params: 47
Non-trainable params: 0
_________________________________________________________________
input the number of test alphabet:>? 5
input test alphabet:>? a
a->b
边栏推荐
- Which enterprises are suitable for small program production and small program development?
- 小程序开发解决零售业的焦虑
- Improved knowledge distillation for training fast lr_fr for fast low resolution face recognition model training
- Interviewer: let you design a set of image loading framework. How would you design it?
- 神经网络实现鸢尾花分类
- 小程序开发
- Deep learning pay attention to MLPs
- 微信小程序开发制作注意这几个重点方面
- What are the general wechat applet development languages?
- Invalid packaging for parent POM x, must be “pom“ but is “jar“ @
猜你喜欢

There is a problem with MySQL paging

微信小程序开发详细步骤是什么?

Four perspectives to teach you to choose applet development tools?

word2vec+回归模型实现分类任务

Digital collections become a new hot spot in tourism industry

Centos7 installing MySQL

Model Inversion Attacks that Exploit Confidence Informati on and Basic Countermeasures 阅读心得

Distributed cluster architecture scenario optimization solution: distributed scheduling problem

Wechat applet development and production should pay attention to these key aspects

神经网络实现鸢尾花分类
随机推荐
Tensorboard visualization
SQLAlchemy使用相关
搭建集群之后崩溃的解决办法
Marsnft: how do individuals distribute digital collections?
What are the general wechat applet development languages?
小程序开发如何提高效率?
tf.keras搭建神经网络功能扩展
On low resolution face recognition in the wild:comparisons and new technologies
无约束低分辨率人脸识别综述一:用于低分辨率人脸识别的数据集
Distributed cluster architecture scenario optimization solution: distributed scheduling problem
【5】 Redis master-slave synchronization and redis sentinel (sentinel)
速查表之转MD5
Boosting unconstrained face recognition with auxiliary unlabeled data to enhance unconstrained face recognition
机群作业管理系统,求解答进程方面的疑问
Automatic scheduled backup of remote MySQL scripts
Deep learning (self supervision: simple Siam) -- Exploring simple Siamese representation learning
Record the problems encountered in online capacity expansion server nochange: partition 1 is size 419428319. It cannot be grown
Reinforcement learning - dqn in value learning
深度学习——Pay Attention to MLPs
Using neural network to predict the weather