当前位置：网站首页>[deep learning theory] (6) recurrent neural network RNN

[deep learning theory] (6) recurrent neural network RNN

2022-06-26 11:01:00 【Vertical sir】

Hello everyone , Today, I'd like to share with you how to process sequence data Cyclic neural network RNN The basic principle of , And use Pytorch Realization RNN Layer and the RNNCell layer .

In my previous blog posts, I have used the recurrent neural network to do many practical cases , Those who are interested can see my column ：https://blog.csdn.net/dgvv4/category_11712004.html

1. Representation of sequences

In a recurrent neural network , Sequence data shape Usually [batch, seq_len, feature_len], among seq_len Number of representative features ,feature_len A representation of each feature .

about Natural language task ： With shape=[b, 5, 100] For example , among 5 Means every sentence has 5 Word , and 100 For each word, use a length of 100 To represent the vector of .

about Time series tasks ： With shape=[b, 100, 1] For example , among 100 For each batch Statistics 100 Days of data , Every day 1 Temperature values .

below Take the emotional analysis task of language as an example , Introduce Traditional methods of processing sequential data , Here's the picture ：

Now there is a sentence The flower is so beautiful As input , adopt wordembedding take Each word has a length of 100 To represent the vector of , Then input each word into the linear layer to extract the feature , The output of each word is a length of 2 Vector , Finally, put all the words together , The classification result is obtained through a linear layer output .

Traditional sequence processing methods have many defects ：

（1） It's a huge amount of computation . There are a lot of words in real life , Generate a linear layer for each word [email protected]+b The extracted features , Then aggregate the output results of the linear layer , The model is very complicated , The number of parameters is extremely large .

（2） No consideration of context . The traditional method only analyzes each word in a sentence individually , There is no information between the words before and after the connection . Such as ：i do not think the flower is beautiful In the sentence , Can't see beautiful It must be a good comment , Contact the above not And then analyze .

2. RNN Principle analysis

Aiming at the problems of traditional sequential task model ,RNN Made improvements ：

（1） Optimize parameter quantity . adopt Weight sharing , Put the... Of each word w1、w2、w3... With a tensor W To express , One RNN The layer deals with an entire sentence .

（2） Connect with the context . Use a timing unit to process context information , The input of the current time must take into account the output of the previous time .

Let's take the emotional analysis task of language as an example , Introduce RNN The basic principle of .

RNN The calculation formula of the element is ： $h_{t} = f_{W}(h_{t-1}, x_{t})$

among , $x_{t}$ representative Input characteristics of the current time ; $h_{t-1}$ representative Last time output , It is also the contextual information aggregated at the last moment ;

Next, expand the formula ： $h_{t} = tanh(W_{hh}h_{t-1} + W_{xh}x_{t})$

among , $W_{xh}$ representative Feature extraction of the input at the current time , $W_{hh}$ representative Feature extraction of previous context information , Then use... For the calculation results tanh Activation function , obtain The context information updated at this moment $h_{t}$

3. RNN Gradient derivation of

Take the time series prediction task as an example , Let's introduce RNN Gradient update mode of , Here's the picture .

take RNN The last contextual information of the layer ht Output as prediction result .predict representative The predicted value from forward propagation ,target representative True value , Loss function For the predicted value and the real value Mean square error MSE.

Forward propagation ： $h_{t} = tanh(W_{hh}h_{t-1} + W_{xh}x_{t})$

linear transformation ： $y_{t} = W_{0}h_{t}$

Loss function ： $\frac{1}{2}(y_{t}-target)^{2}$

The gradient information of the context at each time is updated by the loss function value $W_{hh}$

Back propagation formula ：

$\frac{\partial E_{t} }{\partial W_{hh} } = \sum_{i=0}^{t}\ \frac{\partial E_{t} }{\partial y_{t} } \frac{\partial y_{t} }{\partial h_{t} } \frac{\partial h_{t} }{\partial h_{i} } \frac{\partial h_{i} }{\partial W_{hh} }$

Calculate the partial differential for each fraction separately ：

$\frac{\partial E_{t} }{\partial y_{t}} = \frac{\partial \frac{1}{2}(y_{t}-target)^{2} }{\partial y_{t}}$

$\frac{\partial y_{t} }{\partial h_{t}} = \frac{\partial W_{O}h_{t} }{\partial h_{t}} = W_{O}$

$\frac{\partial h_{i} }{\partial W_{hh}} = \frac{\partial tanh(W_{hh}h_{i-1} + W_{xh}x_{t}) }{\partial W_{hh}}= h_{i-1}$

$\frac{\partial h_{t} }{\partial h_{i}} = \frac{\partial h_{t} }{\partial h_{t-1}} \frac{\partial h_{t-1} }{\partial h_{t-2}} ... \frac{\partial h_{i+1} }{\partial h_{i}} = \prod_{k=i}^{t-1} \frac{\partial h_{k+1} }{\partial h_{k}}$

among ：

$\frac{\partial h_{k+1} }{\partial h_{k}} = diag(tanh'(W_{hx}x_{i}+W_{hh}h_{i-1}))W_{hh}$

4. Model structure

Now let me introduce to you RNN Layer structure , Of each input and output tensor shape

First , Network input shape by [seq_len, batch, feature_len]. among seq_len Number of representative features ,batch Stands for how many sentences ,feature_len A vector representation representing each feature ,hidden_len representative RNN The number of hidden layer neurons of a cell .

With batch=3,seq_len=10,feature_len=100,hidden_len=20 For example , Introduce The input and output characteristics of the network shape change

RNN Layer formula ： $x_{t}@W_{xh} + h_{t}@W_{hh}$

shape Change to ： $[batch, feature len] @ [hidden len, feature len]^{T} + [batch, hidden len ] @ [hidden len, hidden len]^{T}$

Bring in the specific value ： $[3, 100] @ [20, 100]^{T} + [3, 20] @ [20, 20]^{T} = [3,20] + [3,20] = [3,20]$

The following is Pytorch Show individual RNN Layer parameters shape

import torch
from torch import nn

# 100 representative feature_len The length of the vector representation of each word 
# 20 representative hidden_len after RNN After the layer, the length of the vector representation of each word becomes 20
rnn = nn.RNN(100, 20)

#  see RNN Parameters of the unit 
print(rnn._parameters.keys())

#  View the... For each parameter shape
print('W_xh:', rnn.weight_ih_l0.shape,  
      'bias_xh:', rnn.bias_ih_l0.shape,  
      'W_hh:', rnn.weight_hh_l0.shape,  
      'bias_hh:', rnn.bias_hh_l0.shape)    

'''
 Output results ：

odict_keys(['weight_ih_l0', 'weight_hh_l0', 'bias_ih_l0', 'bias_hh_l0'])

W_xh: torch.Size([20, 100]) 
bias_xh: torch.Size([20]) 
W_hh: torch.Size([20, 20]) 
bias_hh: torch.Size([20])
'''

5. Pytorch Code implementation

5.1 monolayer RNN Realization

First you need to instantiate a RNN layer

input_size： How many vectors are used to represent a word .

hidden_size： after RNN After layer feature extraction $x_{t}@W_{xh} + h_{t}@W_{hh}$ , How many vectors are used for each word .

num_layers： How many floors are there RNN.

rnn = nn.RNN(input_size, hidden_size, num_layers)

Forward propagation function

x： Input characteristics of the current time ,shape = [seq_len, batch, feature_len]

h0： Context information of the last moment ,shape = [num_layers, batch, hidden_size]

out： The output of the last moment ,shape = [seq_len, batch, hidden_len]

h： The state of context at all times ,shape = [num_layers, batch, hidden_size]

out, h = rnn(x, h0)

With batch=3,seq_len=10,feature_len=100,hidden_len=20 For example , Single RNN The code for the layer is as follows ：

import torch
from torch import nn

# input_size： The length of a vector representing each word 
# hidden_size： After feature extraction , The vector of each word represents the length 
# num_layers： representative RNN The number of layers 
rnn = nn.RNN(input_size=100, hidden_size=20, num_layers=1)  #  Instantiate a single layer RNN layer 

#  Construct input layer shape=[seq_len, batch, feature_len]
x = torch.randn(10, 3, 100)

#  Construct the context of the last moment shape=[num_layers, batch, hidden_size]
h0 = torch.randn(1, 3, 20)

#  The return value of forward propagation is as follows 
# out： Representing every moment h Output result of shape=[seq_len, batch, hidden_len]
# h： Represents the output result of the last moment shape=[num_layers, batch, hidden_size]
out, h = rnn(x, h0)

print('out:', out.shape, 'h:', h.shape)

'''
 Output results 
out: torch.Size([10, 3, 20]) 
h: torch.Size([1, 3, 20])
'''

5.2 Multi-storey RNN Realization

The parameters are the same as above , What we should pay attention to here is that in the output result of forward propagation ,h Stands for looking at all the previous contextual information at the last moment , and out For each RNN The output of the layer .

4 Layer of RNN The code is as follows ：

import torch
from torch import nn

# input_size： The length of a vector representing each word 
# hidden_size： After feature extraction , The vector of each word represents the length 
# num_layers： representative RNN The number of layers 
rnn = nn.RNN(input_size=100, hidden_size=20, num_layers=4)  #  Instantiation 4 Layer of RNN layer 

#  Construct input layer shape=[seq_len, batch, feature_len]
x = torch.randn(10, 3, 100)

#  Construct the context of the initial moment shape=[num_layers, batch, hidden_size]
h0 = torch.randn(4, 3, 20)

# out： Representing every moment h Output result of shape=[seq_len, batch, hidden_len]
# h： Represents the output result of the last moment shape=[num_layers, batch, hidden_size]
out, h = rnn(x, h0)

print('out:', out.shape, 'h:', h.shape)
'''
out: torch.Size([10, 3, 20]) 
h: torch.Size([4, 3, 20])
'''

5.3 monolayer RNNCell Realization

nn.RNN yes Enter all sentences into RNN Layer , and nn.RNNCell need Manually enter each sentence , also The output state at the current time will not automatically enter the next time . Single RNNCell The structure is as follows .

The implementation process is as follows ：

import torch
from torch import nn

# input_size： The length of a vector representing each word 
# hidden_size： After feature extraction , The vector of each word represents the length 
rnncell = nn.RNNCell(input_size=100, hidden_size=20)  #  Instantiate a single layer RNNcell layer 

#  Construct input layer shape=[seq_len, batch, feature_len]
inputs = torch.randn(10, 3, 100)

#  Construct the context of the initial moment shape=[batch, hidden_size]
h0 = torch.randn(3, 20)

# RNNCell The input of shape=[batch, feature_len]
for x in inputs:
    # h0： Context information representing the current moment shape=[batch, hidden_len]
    h0 = rnncell(x, h0)

print('h0:', h0.shape)
'''
h0: torch.Size([3, 20])
'''

5.4 Layers of RNNCell Realization

In two layers RNNCell Implementation as an example

first RNNCell The layer changes the vector representation length of each word from 100 become 20, the second RNNCell The layer changes the vector representation length of each word from 20 become 15.

first RNNCell The input of yes The word of the current moment and the context state of the previous moment h0, the second RNNCell The input of yes first RNNCell The output of and the context state of the previous moment h1.

The code implementation is as follows ：

import torch
from torch import nn

# input_size： The length of a vector representing each word 
# hidden_size： After feature extraction , The vector of each word represents the length 
rnncell1 = nn.RNNCell(input_size=100, hidden_size=20)  #  Instantiate a single layer RNNcell layer 
rnncell2 = nn.RNNCell(input_size=20, hidden_size=15)

#  Construct input layer shape=[seq_len, batch, feature_len]
inputs = torch.randn(10, 3, 100)

#  Construct the context of the initial moment shape=[batch, hidden_size]
h0 = torch.randn(3, 20)
h1 = torch.randn(3, 15)

# RNNCell The input of shape=[batch, feature_len]
for x in inputs:
    # h0： Context information representing the current moment shape=[batch, hidden_len]
    h0 = rnncell1(x, h0)
    h1 = rnncell2(h0, h1)

print('h1:', h1.shape)
'''
h1: torch.Size([3, 15])
'''

原网站

版权声明
本文为[Vertical sir]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/177/202206260959471260.html