当前位置:网站首页>[deep learning theory] (6) recurrent neural network RNN
[deep learning theory] (6) recurrent neural network RNN
2022-06-26 11:01:00 【Vertical sir】
Hello everyone , Today, I'd like to share with you how to process sequence data Cyclic neural network RNN The basic principle of , And use Pytorch Realization RNN Layer and the RNNCell layer .
In my previous blog posts, I have used the recurrent neural network to do many practical cases , Those who are interested can see my column :https://blog.csdn.net/dgvv4/category_11712004.html
1. Representation of sequences
In a recurrent neural network , Sequence data shape Usually [batch, seq_len, feature_len], among seq_len Number of representative features ,feature_len A representation of each feature .
about Natural language task : With shape=[b, 5, 100] For example , among 5 Means every sentence has 5 Word , and 100 For each word, use a length of 100 To represent the vector of .
about Time series tasks : With shape=[b, 100, 1] For example , among 100 For each batch Statistics 100 Days of data , Every day 1 Temperature values .
below Take the emotional analysis task of language as an example , Introduce Traditional methods of processing sequential data , Here's the picture :

Now there is a sentence The flower is so beautiful As input , adopt wordembedding take Each word has a length of 100 To represent the vector of , Then input each word into the linear layer to extract the feature , The output of each word is a length of 2 Vector , Finally, put all the words together , The classification result is obtained through a linear layer output .
Traditional sequence processing methods have many defects :
(1) It's a huge amount of computation . There are a lot of words in real life , Generate a linear layer for each word [email protected]+b The extracted features , Then aggregate the output results of the linear layer , The model is very complicated , The number of parameters is extremely large .
(2) No consideration of context . The traditional method only analyzes each word in a sentence individually , There is no information between the words before and after the connection . Such as :i do not think the flower is beautiful In the sentence , Can't see beautiful It must be a good comment , Contact the above not And then analyze .
2. RNN Principle analysis
Aiming at the problems of traditional sequential task model ,RNN Made improvements :
(1) Optimize parameter quantity . adopt Weight sharing , Put the... Of each word w1、w2、w3... With a tensor W To express , One RNN The layer deals with an entire sentence .
(2) Connect with the context . Use a timing unit to process context information , The input of the current time must take into account the output of the previous time .
Let's take the emotional analysis task of language as an example , Introduce RNN The basic principle of .
RNN The calculation formula of the element is :
among ,
representative Input characteristics of the current time ;
representative Last time output , It is also the contextual information aggregated at the last moment ;
Next, expand the formula : 
among ,
representative Feature extraction of the input at the current time ,
representative Feature extraction of previous context information , Then use... For the calculation results tanh Activation function , obtain The context information updated at this moment 

3. RNN Gradient derivation of
Take the time series prediction task as an example , Let's introduce RNN Gradient update mode of , Here's the picture .
take RNN The last contextual information of the layer ht Output as prediction result .predict representative The predicted value from forward propagation ,target representative True value , Loss function For the predicted value and the real value Mean square error MSE.
Forward propagation : 
linear transformation : 
Loss function : 

The gradient information of the context at each time is updated by the loss function value 
Back propagation formula :

Calculate the partial differential for each fraction separately :




among :

4. Model structure
Now let me introduce to you RNN Layer structure , Of each input and output tensor shape
First , Network input shape by [seq_len, batch, feature_len]. among seq_len Number of representative features ,batch Stands for how many sentences ,feature_len A vector representation representing each feature ,hidden_len representative RNN The number of hidden layer neurons of a cell .

With batch=3,seq_len=10,feature_len=100,hidden_len=20 For example , Introduce The input and output characteristics of the network shape change
RNN Layer formula : 
shape Change to :![[batch, feature len] @ [hidden len, feature len]^{T} + [batch, hidden len ] @ [hidden len, hidden len]^{T}](http://img.inotgo.com/imagesLocal/202206/26/202206260959471260_6.gif)
Bring in the specific value :![[3, 100] @ [20, 100]^{T} + [3, 20] @ [20, 20]^{T} = [3,20] + [3,20] = [3,20]](http://img.inotgo.com/imagesLocal/202206/26/202206260959471260_13.gif)
The following is Pytorch Show individual RNN Layer parameters shape
import torch
from torch import nn
# 100 representative feature_len The length of the vector representation of each word
# 20 representative hidden_len after RNN After the layer, the length of the vector representation of each word becomes 20
rnn = nn.RNN(100, 20)
# see RNN Parameters of the unit
print(rnn._parameters.keys())
# View the... For each parameter shape
print('W_xh:', rnn.weight_ih_l0.shape,
'bias_xh:', rnn.bias_ih_l0.shape,
'W_hh:', rnn.weight_hh_l0.shape,
'bias_hh:', rnn.bias_hh_l0.shape)
'''
Output results :
odict_keys(['weight_ih_l0', 'weight_hh_l0', 'bias_ih_l0', 'bias_hh_l0'])
W_xh: torch.Size([20, 100])
bias_xh: torch.Size([20])
W_hh: torch.Size([20, 20])
bias_hh: torch.Size([20])
'''5. Pytorch Code implementation
5.1 monolayer RNN Realization
First you need to instantiate a RNN layer
input_size: How many vectors are used to represent a word .
hidden_size: after RNN After layer feature extraction
, How many vectors are used for each word .
num_layers: How many floors are there RNN.
rnn = nn.RNN(input_size, hidden_size, num_layers)Forward propagation function
x: Input characteristics of the current time ,shape = [seq_len, batch, feature_len]
h0: Context information of the last moment ,shape = [num_layers, batch, hidden_size]
out: The output of the last moment ,shape = [seq_len, batch, hidden_len]
h: The state of context at all times ,shape = [num_layers, batch, hidden_size]
out, h = rnn(x, h0)With batch=3,seq_len=10,feature_len=100,hidden_len=20 For example , Single RNN The code for the layer is as follows :
import torch
from torch import nn
# input_size: The length of a vector representing each word
# hidden_size: After feature extraction , The vector of each word represents the length
# num_layers: representative RNN The number of layers
rnn = nn.RNN(input_size=100, hidden_size=20, num_layers=1) # Instantiate a single layer RNN layer
# Construct input layer shape=[seq_len, batch, feature_len]
x = torch.randn(10, 3, 100)
# Construct the context of the last moment shape=[num_layers, batch, hidden_size]
h0 = torch.randn(1, 3, 20)
# The return value of forward propagation is as follows
# out: Representing every moment h Output result of shape=[seq_len, batch, hidden_len]
# h: Represents the output result of the last moment shape=[num_layers, batch, hidden_size]
out, h = rnn(x, h0)
print('out:', out.shape, 'h:', h.shape)
'''
Output results
out: torch.Size([10, 3, 20])
h: torch.Size([1, 3, 20])
'''5.2 Multi-storey RNN Realization
The parameters are the same as above , What we should pay attention to here is that in the output result of forward propagation ,h Stands for looking at all the previous contextual information at the last moment , and out For each RNN The output of the layer .
4 Layer of RNN The code is as follows :
import torch
from torch import nn
# input_size: The length of a vector representing each word
# hidden_size: After feature extraction , The vector of each word represents the length
# num_layers: representative RNN The number of layers
rnn = nn.RNN(input_size=100, hidden_size=20, num_layers=4) # Instantiation 4 Layer of RNN layer
# Construct input layer shape=[seq_len, batch, feature_len]
x = torch.randn(10, 3, 100)
# Construct the context of the initial moment shape=[num_layers, batch, hidden_size]
h0 = torch.randn(4, 3, 20)
# out: Representing every moment h Output result of shape=[seq_len, batch, hidden_len]
# h: Represents the output result of the last moment shape=[num_layers, batch, hidden_size]
out, h = rnn(x, h0)
print('out:', out.shape, 'h:', h.shape)
'''
out: torch.Size([10, 3, 20])
h: torch.Size([4, 3, 20])
'''5.3 monolayer RNNCell Realization
nn.RNN yes Enter all sentences into RNN Layer , and nn.RNNCell need Manually enter each sentence , also The output state at the current time will not automatically enter the next time . Single RNNCell The structure is as follows .

The implementation process is as follows :
import torch
from torch import nn
# input_size: The length of a vector representing each word
# hidden_size: After feature extraction , The vector of each word represents the length
rnncell = nn.RNNCell(input_size=100, hidden_size=20) # Instantiate a single layer RNNcell layer
# Construct input layer shape=[seq_len, batch, feature_len]
inputs = torch.randn(10, 3, 100)
# Construct the context of the initial moment shape=[batch, hidden_size]
h0 = torch.randn(3, 20)
# RNNCell The input of shape=[batch, feature_len]
for x in inputs:
# h0: Context information representing the current moment shape=[batch, hidden_len]
h0 = rnncell(x, h0)
print('h0:', h0.shape)
'''
h0: torch.Size([3, 20])
'''5.4 Layers of RNNCell Realization
In two layers RNNCell Implementation as an example

first RNNCell The layer changes the vector representation length of each word from 100 become 20, the second RNNCell The layer changes the vector representation length of each word from 20 become 15.
first RNNCell The input of yes The word of the current moment and the context state of the previous moment h0, the second RNNCell The input of yes first RNNCell The output of and the context state of the previous moment h1.
The code implementation is as follows :
import torch
from torch import nn
# input_size: The length of a vector representing each word
# hidden_size: After feature extraction , The vector of each word represents the length
rnncell1 = nn.RNNCell(input_size=100, hidden_size=20) # Instantiate a single layer RNNcell layer
rnncell2 = nn.RNNCell(input_size=20, hidden_size=15)
# Construct input layer shape=[seq_len, batch, feature_len]
inputs = torch.randn(10, 3, 100)
# Construct the context of the initial moment shape=[batch, hidden_size]
h0 = torch.randn(3, 20)
h1 = torch.randn(3, 15)
# RNNCell The input of shape=[batch, feature_len]
for x in inputs:
# h0: Context information representing the current moment shape=[batch, hidden_len]
h0 = rnncell1(x, h0)
h1 = rnncell2(h0, h1)
print('h1:', h1.shape)
'''
h1: torch.Size([3, 15])
'''边栏推荐
- Sqli-labs靶场1-5
- 02-Redis数据结构之链表
- Cereals Mall - Distributed Advanced
- 開發者,微服務架構到底是什麼?
- Swiftui development experience: data layer of application design for offline priority
- Common interview questions of binary tree
- Bit operation n & (n-1), leetcode231, interview question 05.06
- Easyexcel - Excel read / write tool
- Idea remote debugger
- Docker中实现MySQL主从复制
猜你喜欢

MySQL 12th job - Application of stored procedure
![[software project management] sorting out knowledge points for final review](/img/13/823faa0607b88374820be3fce82ce7.png)
[software project management] sorting out knowledge points for final review

Swiftui development experience: data layer of application design for offline priority

Vscode environment setup: synchronous configuration

(Typora图床)阿里云oss搭建图床+Picgo上传图片详细教程

Basic MySQL

Win10 start FTP service and set login authentication

Which PHP open source works deserve attention

QT connection MySQL data query failed

量化投资学习——经典书籍介绍
随机推荐
Using reflection to export entity data to excel
Is it safe to open an account in the school of Finance and business?
VS或Qt编译链接过程中出现“无法解析的外部符号”的原因:
Search engine advanced search method records
April 13, 2021 interview with beaver family
laravel 写原生SQL语句
Plookup table in appliedzkp zkevm (8)
RDB persistence validation test
Origin of b+ tree index
Developers, what is the microservice architecture?
Idea remote debugger
【北邮果园微处理器设计】10 Serial Communication 串口通信笔记
Opencv image processing - grayscale processing
2021 Q3-Q4 Kotlin Multiplatform 使用现状 | 调查报告
看我在Map<String, String>集合中,存入Integer类型数据
Redis knowledge mind map
See how I store integer data in the map < string, string > set
JWT (SSO scheme) + three ways of identity authentication
Concise course of probability theory and statistics in engineering mathematics second edition review outline
Consumer microservice Governance Center stepping on the pit