当前位置：网站首页>Tomato learning notes -seq2seq

Tomato learning notes -seq2seq

2022-06-12 06:28:00 【GitTomato】

Catalog

brief introduction
Cyclic neural network and its deformation
Seq2Seq

1. brief introduction

Seq2Seq What's the use ？ When we expect to input a sequence such as X={x1,x2……xn} when , Expect to get a Y={y1,y2……ym} Output , here n $\neq$ m, For example, translation tasks . This learning experience is based on the translation task , So I will later call the input sequence the source statement , Every xi Represents a word vector , Corresponding Y Call it the target statement .

So what kind of neural network model can you think of to deal with ？ Obviously, it is similar to traditional neural networks with full connection and convolution, such as AlexNet Definitely not , Because it is not suitable for the output of the sequence . Cyclic neural networks can output sequences , But usually the number of output sequences is artificially constant , Inconvenient to use ; Plus the usual RNN A model is usually an input corresponding to an output ,m<=n, This leads to, for example, translation , It is difficult to handle when the number of words in the target statement is greater than the number of words in the source statement .

however ,RNN It gives us inspiration , If we put source statement into RNN middle-distance race , Take the last result as the eigenvector that can represent the whole source statement （ If you think this vector can't represent , So congratulations , You have thought of Attention Model ）, Then the feature vector is combined with the target statement for training .

2. Cyclic neural network and its deformation

The picture is I casually from baidu Pictures downloaded from the , Actually RNN There is basically no problem searching the model , After all, it is the most basic recurrent neural network model .

So I simply start from Pytorch Come up and talk about its RNN,Pytorch It didn't happen RNN, Maybe it feels like it needs to be used , Users can write it by themselves . But you'll be in bad luck before you learn （-_-||）

Throw a piece of pseudo code directly to Sister Li , Of course, this pseudo code can't run directly

import torch.nn as nn

def train(RNN, input, hidden):
    hidden = RNN.initHidden() #  It can be directly set to all 0, Mainly used for initial hidden vectors 
    output, hidden = RNN(input, hidden)

# take RNN The internal decomposition of is 
class RNN(nn.Module):
    def __init__(self):  A little 

    def forward(self ,input, hidden):
        temp_output = input
        seq_len, _ = input.shape
        for i in range(seq_len):
            temp_output, hidden = RNNCell(output, hidden)
            output.append(temp_output)
        return output, hidden

    def initHidden(self):
        return np.zeros(...)

class RNNCell(nn.Module):
    ''' A little '''
    def forward(self, input, hidden):
        # You can play freely here , If you really have an idea ,
        # You can simply create one by yourself , For example, full connection （ The most basic RNN）、 Various doors to control （LSTM and GRU And so on ）

3.Seq2Seq

Seq2Seq The most common is Encoder-Decoder Model , That is, the model I mentioned in the introduction that encodes source statements and links them with target statements for training . Pictured ：

Encoder Part and Decoder Part of it is a RNN Model , Of course, if you don't want to use RNN Model when Encoder and Decoder It's OK , That will go directly to the study of Hell difficulty , Please turn left Transformer Model . This model directly converts Encoder,Decoder Express by matrix multiplication , And you don't need the current one RNNCell Run and directly run the results , Not only is it efficient , And the ability of feature extraction is greatly improved .

Encoder-Decoder The code is as follows ：

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import random

trans_sentences = {'SOS glad to meet you EOS':'SOS ni hao EOS', 
                   'SOS nice to meet you EOS':'SOS ni hao EOS',
                   'SOS how old are you EOS':'SOS ni ji sui EOS',
                   'SOS you are handsome EOS':'SOS ni zhen shuai EOS',
                   'SOS you are so good EOS':'SOS ni zhen bang EOS',
                   'SOS you are nice EOS':'SOS ni zhen hao EOS',
                   'SOS I love you EOS':'SOS wo ai ni EOS',
                   'SOS I dislike you EOS':'SOS wo bu xi huan ni EOS',
                   'SOS I have an apple EOS':'SOS wo you yi ge ping guo EOS'}

def getTotalWords(trans_sentences):
    words = []
    for sen in trans_sentences.keys():
        for word in sen.split(' '):
            words.append(word)
    for sen in trans_sentences.values():
        for word in sen.split(' '):
            words.append(word)        
    return sorted(list(set(words)))


def sent2vector(sent):
    words = sent.split(' ')
    vec = []
    for word in words:
        vec.append(wordList.index(word))
    return torch.tensor(vec)


def label2word(label):
    words = []
    for i in label:
        words.append(wordList[i])
    return words


def output2word(output):
    output = output.squeeze()
    if(len(output.shape) == 1):
        maxv, maxi = torch.topk(output, 1, dim=0)
    elif(len(output.shape) == 2):
        maxv, maxi = torch.topk(output, 1, dim=1)
        
    return label2word(maxi)
    

wordList = getTotalWords(trans_sentences)

# simple encoder-decoder component
class EncoderLSTM(nn.Module):
    
    def __init__(self, input_size, hidden_size, output_size):
        super(EncoderLSTM, self).__init__()
        
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        self.embed = nn.Embedding(self.input_size, 10)
        self.lstm = nn.LSTM(10, self.hidden_size)
        self.fc = nn.Linear(self.hidden_size, self.output_size)
        
    def HiddenInit(self):
        return (torch.zeros(1, 1, self.hidden_size),
                torch.zeros(1, 1, self.hidden_size))
    
    def forward(self, input, hidden):
        x = self.embed(input)
        if(len(x.shape) != 3):
            x = x.view(x.shape[0], 1, -1)
        
        output, hidden = self.lstm(x, hidden)
        output = self.fc(output[-1,:])
        return output.view(1,1,-1), hidden
        
        
class DecoderLSTM(nn.Module):
    
    def __init__(self, input_size, hidden_size, output_size):
        super(DecoderLSTM, self).__init__()
        
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        self.embed = nn.Embedding(self.input_size, 10)
        self.cat2hid = nn.Linear(self.input_size + 10 , self.input_size)
        self.lstm = nn.LSTM(self.input_size, self.hidden_size)
        self.fc = nn.Linear(self.hidden_size, self.output_size)
        
    def HiddenInit(self):
        return (torch.zeros(1, 1, self.hidden_size),
                torch.zeros(1, 1, self.hidden_size))
    
    def forward(self, input, hidden, encoder_embed):
        x = self.embed(input)
        if(len(x.shape) == 2):
            x = x.view(x.shape[0], 1, -1)
        elif(len(x.shape) == 1):
            x = x.view(1, 1, -1)
        
        x = torch.cat((x, encoder_embed), dim = 2)
        x = self.cat2hid(x)
        output, hidden = self.lstm(x, hidden)
        output = self.fc(output)
        return output, hidden

input_size = len(wordList)
hidden_size = len(wordList)
output_size = len(wordList)

encoder = EncoderLSTM(input_size, hidden_size, output_size)
decoder = DecoderLSTM(input_size, hidden_size, output_size)

encoder_opt = optim.Adam(encoder.parameters(), lr=0.01)
decoder_opt = optim.Adam(decoder.parameters(), lr=0.01)
crit = nn.CrossEntropyLoss()

running_loss = 0
for epoch in range(100):
    for input in trans_sentences.keys():
        
        target = trans_sentences[input]
        target_label = sent2vector(target)
        input_vec = sent2vector(input)
        hidden = encoder.HiddenInit()
        decoder_input = target_label[0:1]
        
        encoder_opt.zero_grad()
        decoder_opt.zero_grad()
        
        encoder_embed, decoder_hidden = encoder(input_vec, hidden)
        
        ForcingLearning = True if random.random() < 0.5 else False # True means according to the target, False means according to the prediction
#         ForcingLearning = True
        loss = 0
        predict = []
        decoder_hidden = decoder.HiddenInit()
        if(ForcingLearning == True):
            for i in range(len(target_label) - 1):
                # decoder_output give next word, decoder_hidden for hidden vector input
                decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden, encoder_embed)
                predict.append(output2word(decoder_output))
                loss += crit(decoder_output.view(decoder_output.shape[0], -1), target_label[i+1:i+2])
                decoder_input = target_label[i+1:i+2]
        else:
            for i in range(len(target_label) - 1):
                decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden, encoder_embed)
                predict.append(output2word(decoder_output))
                loss += crit(decoder_output.view(decoder_output.shape[0], -1), target_label[i+1:i+2])
                
                _, predict_i = torch.topk(decoder_output.squeeze(), 1, dim=0)
                decoder_input = predict_i

        loss.backward(retain_graph=True)
        encoder_opt.step()
        decoder_opt.step()
        
        running_loss += loss.data
        if(epoch % 25 == 24):
            print("——————————————————————————————————————")
            print("in epoch ( %d ), the average loss is ( %.5f )" % (epoch, running_loss / 25))
            print("The original sentences is [%s], translation is [%s]" % (input, predict))
            running_loss = 0

def predict(model, original):
    
    original = "SOS " + original + " EOS"
    input_vec = sent2vector(original)
    encoder, decoder = model
    decoder_input = sent2vector("SOS")
    
    with torch.no_grad():
        hidden = encoder.HiddenInit()
        encoder_embed, decoder_hidden = encoder(input_vec, hidden)
        
        predict = ["SOS"]
        while (predict[-1] != "EOS"):
                decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden, encoder_embed)
                predict.append(output2word(decoder_output)[0])
                _, predict_i = torch.topk(decoder_output.squeeze(), 1, dim=0)
                decoder_input = predict_i
        
        print(predict[1:-1])
    
predict((encoder, decoder), "ni ji sui")

# complex encoder-decoder with attention mechanicsm
# simple encoder-decoder component
class EncoderLSTM(nn.Module):
    
    def __init__(self, input_size, hidden_size, output_size):
        super(EncoderLSTM, self).__init__()
        
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        self.embed = nn.Embedding(self.input_size, 10)
        self.lstm = nn.LSTM(10, self.hidden_size)
        self.fc = nn.Linear(self.hidden_size, self.output_size)
        
    def HiddenInit(self):
        return (torch.zeros(1, 1, self.hidden_size),
                torch.zeros(1, 1, self.hidden_size))
    
    def forward(self, input, hidden):
        x = self.embed(input)
        if(len(x.shape) != 3):
            x = x.view(x.shape[0], 1, -1)
        
        output, hidden = self.lstm(x, hidden)
        output = self.fc(output)
        return output, hidden
        
        
class AttentionDecoderLSTM(nn.Module):
    
    def __init__(self, input_size, hidden_size, output_size, seq_size):
        super(AttentionDecoderLSTM, self).__init__()
        
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.seq_size = seq_size
        
        self.embed = nn.Embedding(self.input_size, 10)
        self.attn = nn.Linear(self.hidden_size * 2 + 10 , self.seq_size)
        self.attn_combine = nn.Linear(self.hidden_size + 10, self.input_size)
        
        self.lstm = nn.LSTM(self.input_size, self.hidden_size)
        self.fc = nn.Linear(self.hidden_size, self.output_size)
        
    def HiddenInit(self):
        return (torch.zeros(1, 1, self.hidden_size),
                torch.zeros(1, 1, self.hidden_size))
    
    def forward(self, input, hidden, encoder_embed):
        x = self.embed(input)
        if(len(x.shape) == 2):
            x = x.view(x.shape[0], 1, -1)
        elif(len(x.shape) == 1):
            x = x.view(1, 1, -1)
        
        input = x
        x = torch.cat((hidden[0], hidden[1], x), dim = 2)
        weight = F.softmax(self.attn(x), dim=2)
        
#         print(weight[:, :, 0:encoder_embed.shape[0]].shape, encoder_embed.transpose(0, 1).shape)
        attn_applied = torch.bmm(weight[:, :, 0:encoder_embed.shape[0]], encoder_embed.transpose(0, 1))
        x = torch.cat((attn_applied, input), dim=2)
        x = self.attn_combine(x)
        
        output, hidden = self.lstm(x, hidden)
        output = self.fc(output)
        return output, hidden

input_size = len(wordList)
hidden_size = len(wordList)
output_size = len(wordList)

encoder = EncoderLSTM(input_size, hidden_size, output_size)
decoder = AttentionDecoderLSTM(input_size, hidden_size, output_size, 6)

encoder_opt = optim.Adam(encoder.parameters(), lr=0.01)
decoder_opt = optim.Adam(decoder.parameters(), lr=0.01)
crit = nn.CrossEntropyLoss()

running_loss = 0
for epoch in range(1000):
    for input in trans_sentences.keys():
        
        target = trans_sentences[input]
        target_label = sent2vector(target)
        input_vec = sent2vector(input)
        hidden = encoder.HiddenInit()
        decoder_input = target_label[0:1]
        
        encoder_opt.zero_grad()
        decoder_opt.zero_grad()
        
        encoder_embed, decoder_hidden = encoder(input_vec, hidden)
        
#         ForcingLearning = True if random.random() < 0.5 else False # True means according to the target, False means according to the prediction
        ForcingLearning = True
        loss = 0
        predict = []
        if(ForcingLearning == True):
            for i in range(len(target_label) - 1):
                # decoder_output give next word, decoder_hidden for hidden vector input
                decoder_hidden = decoder.HiddenInit()
                decoder_output, _ = decoder(decoder_input, decoder_hidden, encoder_embed)
                predict.append(output2word(decoder_output))
                loss += crit(decoder_output.view(decoder_output.shape[0], -1), target_label[i+1:i+2])
                decoder_input = target_label[i+1:i+2]
        else:
            for i in range(len(target_label) - 1):
                decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden, encoder_embed)
                predict.append(output2word(decoder_output))
                loss += crit(decoder_output.view(decoder_output.shape[0], -1), target_label[i+1:i+2])
                
                _, predict_i = torch.topk(decoder_output.squeeze(), 1, dim=0)
                decoder_input = predict_i

        loss.backward(retain_graph=True)
        encoder_opt.step()
        decoder_opt.step()
        
        running_loss += loss.data
        if(epoch % 250 == 249):
            print("——————————————————————————————————————")
            print("in epoch ( %d ), the average loss is ( %.5f )" % (epoch, running_loss / 250))
            print("The original sentences is [%s], translation is [%s]" % (input, predict))
            running_loss = 0

原网站

版权声明
本文为[GitTomato]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/03/202203010608175729.html