当前位置:网站首页>Cyclic neural network
Cyclic neural network
2022-07-28 06:14:00 【Jiyu Wangchuan】
Cyclic neural network
One 、 Cyclic neural network
Cyclic neural network (recurrent neural network,RNN) It is a kind of neural network used to process sequence data .
1.1 CNN And RNN Simple comparison
CNN: With the help of convolution kernel (kernel) After feature extraction , Into the following network ( Such as fully connected network Dense)
To classify 、 Target detection and other operations .CNN With the help of convolution kernel from Spatial dimension Extract information , Convolution kernel parameter space sharing .
RNN: With the help of circulating nuclear (cell) After feature extraction , Into the following network ( Such as fully connected network Dense)
Make prediction and other operations .RNN With the help of cyclic nuclei from Time dimension Extract information , Cyclic kernel parameter time sharing .
1.2 Circulating nucleus
Circulating nucleus Have memory , Share through parameters at different times , Realize the information extraction of time series
take . Each cyclic core has multiple memories , Corresponding to several small cylinders in the figure below .
The memory body stores the state information of each moment h 𝑡 ℎ_𝑡 ht,
h 𝑡 = t a n h ( 𝑥 𝑡 𝑤 𝑥 h + h 𝑡 − 1 w h h + 𝑏 h ) h𝑡 = tanh( 𝑥_𝑡𝑤_{𝑥ℎ}+ℎ_{𝑡−1}w_{ℎℎ} + 𝑏ℎ) ht=tanh(xtwxh+ht−1whh+bh) among , 𝑤 𝑥 h 、 w h h 𝑤_{𝑥ℎ}、w_{ℎℎ} wxh、whh For the weight matrix , 𝑏 h 𝑏ℎ bh For bias , 𝑥 𝑡 𝑥_𝑡 xt Input characteristics for the current time , h 𝑡 − 1 ℎ_{𝑡−1} ht−1 Status information stored for the last time in memory , t a n h tanh tanh Is the activation function
The output characteristics of the cyclic kernel at the current time 𝑦 𝑡 = s o f t m a x ( h 𝑡 𝑤 h 𝑦 + 𝑏 y ) 𝑦_𝑡=softmax(ℎ_𝑡𝑤_{ℎ𝑦}+𝑏y ) yt=softmax(htwhy+by) among w w y w_{wy} wwy For the weight matrix 、 𝑏 y 𝑏y by For bias 、 s o f t m a x softmax softmax Is the activation function , In fact, it is equivalent to a full connection layer . We can set the number of memory to change the memory capacity , When the number of memory is specified 、 Input 𝑥 𝑡 𝑥_𝑡 xt Output y t y_t yt Dimensions are specified , The dimensions of these parameters to be trained are also limited .
In forward propagation , Memorize the state information stored in the body h𝑡𝑡 Refreshed at every moment , And the three parameter matrix w x h w_{xh} wxh、 w h h w_{hh} whh、 w h y w_{hy} why And two offset terms b h bh bh、 b y by by It is fixed from beginning to end . In back propagation , Three parameter matrices and two bias terms are updated by gradient descent method .
1.3 The cycle core expands in time steps
Expand the cycle core in time steps , Is to expand the cycle core in the direction of the time axis , You can get the form shown in the figure below . Memory status information at each time h t h_t ht Has been refreshed , The parameter matrix and two bias terms around the memory are fixed , What we train and optimize is these parameter matrices . After training , Use the best parameter matrix to perform forward propagation , Then output the prediction results .
In fact, this is consistent with our human prediction : The memory in our brain is updated every moment according to the current input ; The current predictive reasoning is based on our previous knowledge accumulation “ Parameter matrix ” Reasoning and judgment .
It can be seen that , Cyclic neural network is to extract the time feature with the help of cyclic kernel
Information is sent to the fully connected network , So as to realize the prediction of continuous data .
1.4 Cyclic computing layer : Grow in the direction of output
stay RNN in , Each cycle core forms a cycle computing layer , The number of layers of cyclic computing layer increases in the direction of output . As shown in the figure below , The network on the left has a cyclic core , It forms a layer of cyclic calculation ; The network in the figure has two cyclic cores , It forms two layers of cyclic calculation ; The network on the right has three cyclic cores , It forms a three-layer cyclic calculation layer . among , The number of memory in each cyclic core in the three networks can be arbitrarily specified according to our needs .
1.5 RNN Training
obtain RNN After the forward propagation of the results , Similar to other neural networks , We will define the loss function , The back propagation gradient descent algorithm is used to train the model .
RNN The only difference is : Because of it, the node at each time may have an output , therefore RNN The total loss is all times ( Or part of the time ) Losses and .
Two 、Tensorflow Describe the cyclic computing layer
tf.keras.layers.SimpleRNN( Number of neurons ,activation=‘ Activation function ’,return_sequences= Whether to output every time ℎ𝑡 To the next floor )
(1) Number of neurons : That is, the number of memory in the cyclic core
(2) return_sequences: In the output sequence , Returns the output value of the last time step ℎ𝑡 Or return the output of all time steps .False Return to the last moment ,True Return to all times . The next level is still RNN layer , Usually it is True, On the contrary, if it is followed by Dense layer , Usually it is Fasle.
(3) Input dimensions : Three dimensional tensor ( Enter the number of samples , Cycle kernel time expansion steps , Enter the number of features in each time step ).
As shown in the figure above , The picture on the left shows a total of RNN Two sets of data in layer , After a time step, each group of data will get the output result , Enter three values in each time step , Then the data dimension of the input loop layer is chart 1.2.6 RNN Layer input dimension [2, 1, 3]
There is only one set of data input in the figure on the right , It is sent to the circulation layer in four time steps , Enter two values per time step , Then the data dimension of the input loop layer is [1,4,2].
(4) Output dimension :
- return_sequenc=True, Three dimensional tensor ( Enter the number of samples , Cycle kernel time expansion steps , Number of neurons in this layer )
- return_sequenc=False, Two dimensional tensor ( Enter the number of samples , Number of neurons in this layer )
(5) activation:‘ Activation function ’( Do not write default use tanh)
3、 ... and 、 Example of cyclic calculation process
RNN The most typical application is to use historical data to predict what will happen at the next moment , That is, make predictions according to the historical laws seen before . Take a simple example of letter prediction to experience the calculation process of circular network : Enter one letter to predict the next letter —— Input a Predict b、 Input b Predict c、 Input c Predict d、 Input d Predict e、 Input e Predict a. Computers don't recognize letters , Only numbers can be processed . So we need to code the letters . It is assumed here that the single heat coding is used ( In practice, other coding methods can be used ), The coding results are shown in the figure below .
| Hot coding alone | Letter |
|---|---|
| 10000 | a |
| 01000 | b |
| 00100 | c |
| 00010 | d |
| 00001 | e |
Suppose you use a layer RNN The Internet , Select the number of memory 3, The network of letter prediction is shown in the figure below .
Suppose you enter a letter b, The input 𝑥 𝑡 𝑥_𝑡 xt by [ 0 , 1 , 0 , 0 , 0 ] [0,1,0,0,0] [0,1,0,0,0], At this time, the memory status information of the last time h 𝑡 ℎ_𝑡 ht by 0. From the above theoretical knowledge, it is not difficult to get : h 𝑡 = t a n h ( 𝑥 𝑡 𝑤 𝑥 h + h t − 1 w h h + 𝑏 ) = t a n h ( [ − 2.3 , 0.8 , 1.1 ] + 0 + [ 0.5 , 0.3 , − 0.2 ] ) = t a n h [ − 1.8 , 1.1 , 0.9 ] = [ − 0.9 , 0.8 , 0.7 ] h_𝑡=tanh(𝑥_𝑡𝑤_{𝑥ℎ}+ℎ_{t-1}w_{ℎℎ}+𝑏)\\=tanh([−2.3,0.8,1.1 ]+ 0 + [ 0.5,0.3,−0.2])\\= tanh[−1.8,1.1,0.9 ] = [−0.9 ,0.8,0.7] ht=tanh(xtwxh+ht−1whh+b)=tanh([−2.3,0.8,1.1]+0+[0.5,0.3,−0.2])=tanh[−1.8,1.1,0.9]=[−0.9,0.8,0.7] This process can be understood as that the memory in the brain is updated due to the current input .
Output y 𝑡 y_𝑡 yt It is the process of identifying and predicting the extracted time information through full connection , It is the output layer of the whole network .
y 𝑡 = s o f t m a x ( h 𝑡 𝑤 h 𝑦 + 𝑏 y ) = s o f t m a x ( [ − 0.7 , − 0.6 , 2.9 , 0.7 , − 0.8 ] + [ 0.0 , 0.1 , 0.4 , − 0.7 , 0.1 ] ) = s o f t m a x ( [ − 0.7 − 0.53.30.0 − 0.7 ] ) = [ 0.02 , 0.02 , 0.91 , 0.03 , 0.02 ] y_𝑡=softmax(ℎ_𝑡𝑤_{ℎ𝑦}+𝑏y )= softmax([−0.7,−0.6,2.9,0.7,−0.8] + [ 0.0,0.1,0.4,−0.7,0.1])\\= softmax([−0.7 −0.5 3.3 0.0 −0.7])= [0.02,0.02,0.91, 0.03,0.02 ] yt=softmax(htwhy+by)=softmax([−0.7,−0.6,2.9,0.7,−0.8]+[0.0,0.1,0.4,−0.7,0.1])=softmax([−0.7−0.53.30.0−0.7])=[0.02,0.02,0.91,0.03,0.02] It can be seen that the model thinks there is 91% The possibility of outputting letters c , So the circular network outputs the prediction results c.
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, SimpleRNN
import matplotlib.pyplot as plt
import os
input_word = "abcde"
w_to_id = {
'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4} # Words are mapped to numbers id The dictionary of
id_to_onehot = {
0: [1., 0., 0., 0., 0.], 1: [0., 1., 0., 0., 0.], 2: [0., 0., 1., 0., 0.], 3: [0., 0., 0., 1., 0.],
4: [0., 0., 0., 0., 1.]} # id Encoded as one-hot
x_train = [id_to_onehot[w_to_id['a']], id_to_onehot[w_to_id['b']], id_to_onehot[w_to_id['c']],
id_to_onehot[w_to_id['d']], id_to_onehot[w_to_id['e']]]
y_train = [w_to_id['b'], w_to_id['c'], w_to_id['d'], w_to_id['e'], w_to_id['a']]
np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
tf.random.set_seed(7)
# send x_train accord with SimpleRNN Input requirements :[ Number of samples sent , Cycle kernel time expansion steps , Enter the number of features in each time step ].
# Here, the whole data set is sent to , The number of samples sent is len(x_train); Input 1 A letter gives the result , The number of expansion steps of cycle kernel time is 1; Expressed as a single hot code with 5 Input features , The number of input features in each time step is 5
x_train = np.reshape(x_train, (len(x_train), 1, 5))
y_train = np.array(y_train)
model = tf.keras.Sequential([
SimpleRNN(3),
Dense(5, activation='softmax')
])
model.compile(optimizer=tf.keras.optimizers.Adam(0.01),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
metrics=['sparse_categorical_accuracy'])
checkpoint_save_path = "./checkpoint/rnn_onehot_1pre1.ckpt"
if os.path.exists(checkpoint_save_path + '.index'):
print('-------------load the model-----------------')
model.load_weights(checkpoint_save_path)
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path,
save_weights_only=True,
save_best_only=True,
monitor='loss') # because fit No test set is given , Test set accuracy is not calculated , according to loss, Save the best model
history = model.fit(x_train, y_train, batch_size=32, epochs=100, callbacks=[cp_callback])
model.summary()
# print(model.trainable_variables)
file = open('./weights.txt', 'w') # Parameter extraction
for v in model.trainable_variables:
file.write(str(v.name) + '\n')
file.write(str(v.shape) + '\n')
file.write(str(v.numpy()) + '\n')
file.close()
############################################### show ###############################################
# Displays the of the training set and the validation set acc and loss curve
acc = history.history['sparse_categorical_accuracy']
loss = history.history['loss']
plt.subplot(1, 2, 1)
plt.plot(acc, label='Training Accuracy')
plt.title('Training Accuracy')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(loss, label='Training Loss')
plt.title('Training Loss')
plt.legend()
plt.show()
############### predict #############
preNum = int(input("input the number of test alphabet:"))
for i in range(preNum):
alphabet1 = input("input test alphabet:")
alphabet = [id_to_onehot[w_to_id[alphabet1]]]
# send alphabet accord with SimpleRNN Input requirements :[ Number of samples sent , Cycle kernel time expansion steps , Enter the number of features in each time step ]. Here the verification effect is sent to 1 Samples , The number of samples sent is 1; Input 1 A letter gives the result , So the number of cyclic kernel time expansion steps is 1; Expressed as a single hot code with 5 Input features , The number of input features in each time step is 5
alphabet = np.reshape(alphabet, (1, 1, 5))
result = model.predict([alphabet])
pred = tf.argmax(result, axis=1)
pred = int(pred)
tf.print(alphabet1 + '->' + input_word[pred])
The operation results are as follows :
Epoch 100/100
5/5 [==============================] - 0s 34ms/sample - loss: 0.0400 - sparse_categorical_accuracy: 1.0000
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
simple_rnn (SimpleRNN) multiple 27
_________________________________________________________________
dense (Dense) multiple 20
=================================================================
Total params: 47
Trainable params: 47
Non-trainable params: 0
_________________________________________________________________
input the number of test alphabet:>? 5
input test alphabet:>? a
a->b
边栏推荐
- Construction of redis master-slave architecture
- ssh/scp断点续传rsync
- 将项目部署到GPU上,并且运行
- NLP中基于Bert的数据预处理
- Reinforcement learning - Multi-Agent Reinforcement Learning
- Improved knowledge distillation for training fast lr_fr for fast low resolution face recognition model training
- 基于选择性知识提取的野外低分辨率人脸识别的论文阅读笔记
- 一、语音合成与自回归模型
- Wechat applet development and production should pay attention to these key aspects
- Quick look-up table to MD5
猜你喜欢

小程序开发哪家更靠谱呢?

Paper reading notes of field low resolution face recognition based on selective knowledge extraction

强化学习——价值学习中的SARSA

小程序开发如何提高效率?

三、OpenVINO实战:图像分类

强化学习——多智能体强化学习

Matplotlib data visualization

Record the problems encountered in online capacity expansion server nochange: partition 1 is size 419428319. It cannot be grown

What are the points for attention in the development and design of high-end atmospheric applets?

First meet flask
随机推荐
知识点21-泛型
Interpreting the knowledge in a neural network
2: Why read write separation
uniapp webview监听页面加载后回调
Reinforcement learning - incomplete observation problem, MCTs
Applet development
卷积神经网络
Bert based data preprocessing in NLP
二、OpenVINO简述与构建流程
ssh/scp断点续传rsync
What about the app store on wechat?
UNL class diagram
What are the detailed steps of wechat applet development?
What should we pay attention to when making template application of wechat applet?
强化学习——连续控制
Reinforcement learning -- SARS in value learning
深度学习(自监督:SimSiam)——Exploring Simple Siamese Representation Learning
微信小程序开发制作注意这几个重点方面
Digital collections "chaos", 100 billion market change is coming?
KubeSphere安装版本问题