当前位置：网站首页>pytorch学习记录（六）：循环神经网络 RNN ＆ LSTM

pytorch学习记录（六）：循环神经网络 RNN ＆ LSTM

2022-07-30 13:25:00 【狸狸Arina】

文章目录

1. 时间序列表示
2. 循环神经网络

1. 时间序列表示

1.1 word embedding

pytorch只支持数值类型，不能支持string类型，那必须把string类型表示为数值类型，这种方法就叫做representation或者word embedding；

1.2 one-hot编码

在这里插入图片描述

1.3 word2vec

one-hot 编码稀疏；
考虑到单词之间的相似性；常用的方法有word2vec, glove；

1.4 序列的batch表示

[word num, b, word vec] 以时间戳表示；
[b, word num. word vec] 以句子表示；

2. 循环神经网络

2.1 RNN 的形式

在这里插入图片描述

2.2 RNN Layer

在这里插入图片描述

2.2.1 nn.RNN

在这里插入图片描述

import torch
import torch.nn as nn

rnn = nn.RNN(50, 10)
print(rnn._parameters.keys())

print(rnn.weight_hh_l0.shape)
print(rnn.weight_ih_l0.shape)
print(rnn.bias_hh_l0.shape)
print(rnn.bias_ih_l0.shape)

''' odict_keys(['weight_ih_l0', 'weight_hh_l0', 'bias_ih_l0', 'bias_hh_l0']) torch.Size([10, 10]) torch.Size([10, 50]) torch.Size([10]) torch.Size([10]) '''

2.2.2 nn.RNNCell

nn.RNNCell完成单个时间戳的单层的数据传递、预测；功能和n n.RNN一样，只是没有堆叠层和stack所有out的功能；

import torch
import torch.nn as nn

cell1 = nn.RNNCell(100, 50,)
cell2 = nn.RNNCell(50, 30)
input = torch.randn(3,3,100)
ht1 = torch.zeros(3,50)
ht2 = torch.zeros(3,30)
for x_cell in input:
    ht1 = cell1(x_cell, ht1)
    ht2 = cell2(ht1, ht2)
print(ht1.shape)
print(ht2.shape)

''' torch.Size([3, 50]) torch.Size([3, 30]) '''

2.3 时间序列预测

from cProfile import label
import torch
import torch.nn as nn
import numpy as np  
from matplotlib import pyplot as plt
def generate_data():
    num_time_steps = 60
    #生成一个随机时间点开始的样本
    start = np.random.randint(3, size = 1)[0] #随机初始化一个开始点
    time_steps = np.linspace(start, start+10, num_time_steps)
    data = np.sin(time_steps)
    data = data.reshape(num_time_steps, 1)  #数据长度为1
    x = torch.tensor(data[:-1]).float().view(1, num_time_steps-1, 1)  #添加一个batch维度为1
    y = torch.tensor(data[1:]).float().view(1, num_time_steps-1, 1)
    return x,y,time_steps

class Net(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size):
        super().__init__()
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first = True)
        self.linear = nn.Linear(hidden_size, output_size)

    def forward(self, x, hidden_pre):
        b = x.size(0)
        out, hidden_pre = self.rnn(x, hidden_pre) # out[b, seq, hidden_size] hidden_pre = [b, num_layer, hidden_size]
        out = out.view(-1, self.hidden_size) #[b*seq, hidden_size]
        out = self.linear(out)#[b*seq, 1]
        out = out.view(b, -1, self.output_size) #[b, seq, 1]
        return out, hidden_pre


if __name__ == '__main__':
    batch_size = 1
    num_layers = 2
    hidden_size = 10
    model = Net(1, hidden_size, num_layers, 1)
    criteron = nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr = 0.5e-2)
    hidden_pre = torch.zeros(num_layers, batch_size, hidden_size)
    for iter in range(6000):
        x,y,_ = generate_data()
        out, hidden_pre = model(x, hidden_pre)
        hidden_pre = hidden_pre.detach()
        loss = criteron(out, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        if iter %100 == 0:
            print('iteration:{} loss:{}'.format(iter, loss.item()))
    x,y,time_steps = generate_data()
    
    input = x[:,0,:].unsqueeze(1)  #取所有batch的序列的第一个数据 [b, 1, word_dim]
    h = torch.zeros(num_layers, batch_size, hidden_size)
    predictions = []
    for _ in range(x.shape[1]): #遍历序列
        pred, h = model(input, h)
        input = pred
        predictions.append(pred.detach().numpy().ravel()[0])
    
    figure = plt.figure(figsize=(20,20), dpi = 80)
    plt.scatter(time_steps[1:], predictions, label='pred')
    plt.scatter(time_steps[1:], y.view(-1).numpy(), label = 'sin')
    plt.legend()
    plt.show()
    
''' iteration:0 loss:0.6097927689552307 iteration:100 loss:0.007763191591948271 iteration:200 loss:0.0011507287854328752 iteration:300 loss:0.00087575992802158 iteration:400 loss:0.0005032330518588424 iteration:500 loss:0.0004986028652638197 iteration:600 loss:0.0009817895479500294 iteration:700 loss:0.00040510689723305404 iteration:800 loss:0.0010686117457225919 '''

2.4 RNN训练难题

对于长序列数据，会出现梯度爆炸，或者梯度弥散的情况;

2.4.1 梯度爆炸

Cliping：梯度大于阈值的时候，除以它自己的范数，使得梯度范数变为1;

2.4.2 梯度弥散

在这里插入图片描述

LSTM 解决梯度弥散；

2.5 LSTM

RNN只能保存当前单词的附近的语境，对于离得远的单词或者前面的一些单词就给忘记了（short-term menmory）；
LSTM可以记住特别长的时间序列，所以叫long-short-term menmory；
RNN 展开形式
LSTM 展开形式
梯度更计算的时候不会出现w^k 的情况，并且梯度计算是几项累加，几乎不会同时出现都很小，或者都很大的情况，所以避免了梯度等于0，也就是梯度弥散；

2.6 LSTM使用

2.6.1 nn.LSTM

在这里插入图片描述

lstm = nn.LSTM(100, 20, 4)
c = torch.zeros(4, 30, 20)
h = torch.zeros(4, 30, 20)
x = torch.rand(80, 30, 100)
out, (h,c) = lstm(x, (h,c))
print(out.shape)
print(h.shape)
print(c.shape)

''' torch.Size([80, 30, 20]) torch.Size([4, 30, 20]) torch.Size([4, 30, 20]) '''