当前位置:网站首页>Tomato learning notes -seq2seq
Tomato learning notes -seq2seq
2022-06-12 06:28:00 【GitTomato】
Catalog
- brief introduction
- Cyclic neural network and its deformation
- Seq2Seq
1. brief introduction
Seq2Seq What's the use ? When we expect to input a sequence such as X={x1,x2……xn} when , Expect to get a Y={y1,y2……ym} Output , here n
m, For example, translation tasks . This learning experience is based on the translation task , So I will later call the input sequence the source statement , Every xi Represents a word vector , Corresponding Y Call it the target statement .
So what kind of neural network model can you think of to deal with ? Obviously, it is similar to traditional neural networks with full connection and convolution, such as AlexNet Definitely not , Because it is not suitable for the output of the sequence . Cyclic neural networks can output sequences , But usually the number of output sequences is artificially constant , Inconvenient to use ; Plus the usual RNN A model is usually an input corresponding to an output ,m<=n, This leads to, for example, translation , It is difficult to handle when the number of words in the target statement is greater than the number of words in the source statement .
however ,RNN It gives us inspiration , If we put source statement into RNN middle-distance race , Take the last result as the eigenvector that can represent the whole source statement ( If you think this vector can't represent , So congratulations , You have thought of Attention Model ), Then the feature vector is combined with the target statement for training .
2. Cyclic neural network and its deformation
The picture is I casually from baidu Pictures downloaded from the , Actually RNN There is basically no problem searching the model , After all, it is the most basic recurrent neural network model .

So I simply start from Pytorch Come up and talk about its RNN,Pytorch It didn't happen RNN, Maybe it feels like it needs to be used , Users can write it by themselves . But you'll be in bad luck before you learn (-_-||)
Throw a piece of pseudo code directly to Sister Li , Of course, this pseudo code can't run directly
import torch.nn as nn
def train(RNN, input, hidden):
hidden = RNN.initHidden() # It can be directly set to all 0, Mainly used for initial hidden vectors
output, hidden = RNN(input, hidden)
# take RNN The internal decomposition of is
class RNN(nn.Module):
def __init__(self): A little
def forward(self ,input, hidden):
temp_output = input
seq_len, _ = input.shape
for i in range(seq_len):
temp_output, hidden = RNNCell(output, hidden)
output.append(temp_output)
return output, hidden
def initHidden(self):
return np.zeros(...)
class RNNCell(nn.Module):
''' A little '''
def forward(self, input, hidden):
# You can play freely here , If you really have an idea ,
# You can simply create one by yourself , For example, full connection ( The most basic RNN)、 Various doors to control (LSTM and GRU And so on )
3.Seq2Seq
Seq2Seq The most common is Encoder-Decoder Model , That is, the model I mentioned in the introduction that encodes source statements and links them with target statements for training . Pictured :

Encoder Part and Decoder Part of it is a RNN Model , Of course, if you don't want to use RNN Model when Encoder and Decoder It's OK , That will go directly to the study of Hell difficulty , Please turn left Transformer Model . This model directly converts Encoder,Decoder Express by matrix multiplication , And you don't need the current one RNNCell Run and directly run the results , Not only is it efficient , And the ability of feature extraction is greatly improved .
Encoder-Decoder The code is as follows :
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import random
trans_sentences = {'SOS glad to meet you EOS':'SOS ni hao EOS',
'SOS nice to meet you EOS':'SOS ni hao EOS',
'SOS how old are you EOS':'SOS ni ji sui EOS',
'SOS you are handsome EOS':'SOS ni zhen shuai EOS',
'SOS you are so good EOS':'SOS ni zhen bang EOS',
'SOS you are nice EOS':'SOS ni zhen hao EOS',
'SOS I love you EOS':'SOS wo ai ni EOS',
'SOS I dislike you EOS':'SOS wo bu xi huan ni EOS',
'SOS I have an apple EOS':'SOS wo you yi ge ping guo EOS'}
def getTotalWords(trans_sentences):
words = []
for sen in trans_sentences.keys():
for word in sen.split(' '):
words.append(word)
for sen in trans_sentences.values():
for word in sen.split(' '):
words.append(word)
return sorted(list(set(words)))
def sent2vector(sent):
words = sent.split(' ')
vec = []
for word in words:
vec.append(wordList.index(word))
return torch.tensor(vec)
def label2word(label):
words = []
for i in label:
words.append(wordList[i])
return words
def output2word(output):
output = output.squeeze()
if(len(output.shape) == 1):
maxv, maxi = torch.topk(output, 1, dim=0)
elif(len(output.shape) == 2):
maxv, maxi = torch.topk(output, 1, dim=1)
return label2word(maxi)
wordList = getTotalWords(trans_sentences)
# simple encoder-decoder component
class EncoderLSTM(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(EncoderLSTM, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
self.embed = nn.Embedding(self.input_size, 10)
self.lstm = nn.LSTM(10, self.hidden_size)
self.fc = nn.Linear(self.hidden_size, self.output_size)
def HiddenInit(self):
return (torch.zeros(1, 1, self.hidden_size),
torch.zeros(1, 1, self.hidden_size))
def forward(self, input, hidden):
x = self.embed(input)
if(len(x.shape) != 3):
x = x.view(x.shape[0], 1, -1)
output, hidden = self.lstm(x, hidden)
output = self.fc(output[-1,:])
return output.view(1,1,-1), hidden
class DecoderLSTM(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(DecoderLSTM, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
self.embed = nn.Embedding(self.input_size, 10)
self.cat2hid = nn.Linear(self.input_size + 10 , self.input_size)
self.lstm = nn.LSTM(self.input_size, self.hidden_size)
self.fc = nn.Linear(self.hidden_size, self.output_size)
def HiddenInit(self):
return (torch.zeros(1, 1, self.hidden_size),
torch.zeros(1, 1, self.hidden_size))
def forward(self, input, hidden, encoder_embed):
x = self.embed(input)
if(len(x.shape) == 2):
x = x.view(x.shape[0], 1, -1)
elif(len(x.shape) == 1):
x = x.view(1, 1, -1)
x = torch.cat((x, encoder_embed), dim = 2)
x = self.cat2hid(x)
output, hidden = self.lstm(x, hidden)
output = self.fc(output)
return output, hidden
input_size = len(wordList)
hidden_size = len(wordList)
output_size = len(wordList)
encoder = EncoderLSTM(input_size, hidden_size, output_size)
decoder = DecoderLSTM(input_size, hidden_size, output_size)
encoder_opt = optim.Adam(encoder.parameters(), lr=0.01)
decoder_opt = optim.Adam(decoder.parameters(), lr=0.01)
crit = nn.CrossEntropyLoss()
running_loss = 0
for epoch in range(100):
for input in trans_sentences.keys():
target = trans_sentences[input]
target_label = sent2vector(target)
input_vec = sent2vector(input)
hidden = encoder.HiddenInit()
decoder_input = target_label[0:1]
encoder_opt.zero_grad()
decoder_opt.zero_grad()
encoder_embed, decoder_hidden = encoder(input_vec, hidden)
ForcingLearning = True if random.random() < 0.5 else False # True means according to the target, False means according to the prediction
# ForcingLearning = True
loss = 0
predict = []
decoder_hidden = decoder.HiddenInit()
if(ForcingLearning == True):
for i in range(len(target_label) - 1):
# decoder_output give next word, decoder_hidden for hidden vector input
decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden, encoder_embed)
predict.append(output2word(decoder_output))
loss += crit(decoder_output.view(decoder_output.shape[0], -1), target_label[i+1:i+2])
decoder_input = target_label[i+1:i+2]
else:
for i in range(len(target_label) - 1):
decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden, encoder_embed)
predict.append(output2word(decoder_output))
loss += crit(decoder_output.view(decoder_output.shape[0], -1), target_label[i+1:i+2])
_, predict_i = torch.topk(decoder_output.squeeze(), 1, dim=0)
decoder_input = predict_i
loss.backward(retain_graph=True)
encoder_opt.step()
decoder_opt.step()
running_loss += loss.data
if(epoch % 25 == 24):
print("——————————————————————————————————————")
print("in epoch ( %d ), the average loss is ( %.5f )" % (epoch, running_loss / 25))
print("The original sentences is [%s], translation is [%s]" % (input, predict))
running_loss = 0
def predict(model, original):
original = "SOS " + original + " EOS"
input_vec = sent2vector(original)
encoder, decoder = model
decoder_input = sent2vector("SOS")
with torch.no_grad():
hidden = encoder.HiddenInit()
encoder_embed, decoder_hidden = encoder(input_vec, hidden)
predict = ["SOS"]
while (predict[-1] != "EOS"):
decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden, encoder_embed)
predict.append(output2word(decoder_output)[0])
_, predict_i = torch.topk(decoder_output.squeeze(), 1, dim=0)
decoder_input = predict_i
print(predict[1:-1])
predict((encoder, decoder), "ni ji sui")
# complex encoder-decoder with attention mechanicsm
# simple encoder-decoder component
class EncoderLSTM(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(EncoderLSTM, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
self.embed = nn.Embedding(self.input_size, 10)
self.lstm = nn.LSTM(10, self.hidden_size)
self.fc = nn.Linear(self.hidden_size, self.output_size)
def HiddenInit(self):
return (torch.zeros(1, 1, self.hidden_size),
torch.zeros(1, 1, self.hidden_size))
def forward(self, input, hidden):
x = self.embed(input)
if(len(x.shape) != 3):
x = x.view(x.shape[0], 1, -1)
output, hidden = self.lstm(x, hidden)
output = self.fc(output)
return output, hidden
class AttentionDecoderLSTM(nn.Module):
def __init__(self, input_size, hidden_size, output_size, seq_size):
super(AttentionDecoderLSTM, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
self.seq_size = seq_size
self.embed = nn.Embedding(self.input_size, 10)
self.attn = nn.Linear(self.hidden_size * 2 + 10 , self.seq_size)
self.attn_combine = nn.Linear(self.hidden_size + 10, self.input_size)
self.lstm = nn.LSTM(self.input_size, self.hidden_size)
self.fc = nn.Linear(self.hidden_size, self.output_size)
def HiddenInit(self):
return (torch.zeros(1, 1, self.hidden_size),
torch.zeros(1, 1, self.hidden_size))
def forward(self, input, hidden, encoder_embed):
x = self.embed(input)
if(len(x.shape) == 2):
x = x.view(x.shape[0], 1, -1)
elif(len(x.shape) == 1):
x = x.view(1, 1, -1)
input = x
x = torch.cat((hidden[0], hidden[1], x), dim = 2)
weight = F.softmax(self.attn(x), dim=2)
# print(weight[:, :, 0:encoder_embed.shape[0]].shape, encoder_embed.transpose(0, 1).shape)
attn_applied = torch.bmm(weight[:, :, 0:encoder_embed.shape[0]], encoder_embed.transpose(0, 1))
x = torch.cat((attn_applied, input), dim=2)
x = self.attn_combine(x)
output, hidden = self.lstm(x, hidden)
output = self.fc(output)
return output, hidden
input_size = len(wordList)
hidden_size = len(wordList)
output_size = len(wordList)
encoder = EncoderLSTM(input_size, hidden_size, output_size)
decoder = AttentionDecoderLSTM(input_size, hidden_size, output_size, 6)
encoder_opt = optim.Adam(encoder.parameters(), lr=0.01)
decoder_opt = optim.Adam(decoder.parameters(), lr=0.01)
crit = nn.CrossEntropyLoss()
running_loss = 0
for epoch in range(1000):
for input in trans_sentences.keys():
target = trans_sentences[input]
target_label = sent2vector(target)
input_vec = sent2vector(input)
hidden = encoder.HiddenInit()
decoder_input = target_label[0:1]
encoder_opt.zero_grad()
decoder_opt.zero_grad()
encoder_embed, decoder_hidden = encoder(input_vec, hidden)
# ForcingLearning = True if random.random() < 0.5 else False # True means according to the target, False means according to the prediction
ForcingLearning = True
loss = 0
predict = []
if(ForcingLearning == True):
for i in range(len(target_label) - 1):
# decoder_output give next word, decoder_hidden for hidden vector input
decoder_hidden = decoder.HiddenInit()
decoder_output, _ = decoder(decoder_input, decoder_hidden, encoder_embed)
predict.append(output2word(decoder_output))
loss += crit(decoder_output.view(decoder_output.shape[0], -1), target_label[i+1:i+2])
decoder_input = target_label[i+1:i+2]
else:
for i in range(len(target_label) - 1):
decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden, encoder_embed)
predict.append(output2word(decoder_output))
loss += crit(decoder_output.view(decoder_output.shape[0], -1), target_label[i+1:i+2])
_, predict_i = torch.topk(decoder_output.squeeze(), 1, dim=0)
decoder_input = predict_i
loss.backward(retain_graph=True)
encoder_opt.step()
decoder_opt.step()
running_loss += loss.data
if(epoch % 250 == 249):
print("——————————————————————————————————————")
print("in epoch ( %d ), the average loss is ( %.5f )" % (epoch, running_loss / 250))
print("The original sentences is [%s], translation is [%s]" % (input, predict))
running_loss = 0边栏推荐
- LeetCode-1078. Bigram participle
- Script for unity3d to recursively search for a node with a specific name from all child nodes of a node
- 2021 RoboCom 世界机器人开发者大赛-本科组(初赛)
- PHP 开发环境搭建及数据库增删改查
- Redis configuration (III) -- master-slave replication
- Apache poi 导入导出Excel文件
- C2w model - language model
- leetcode 704. Binary search
- Remap function of C different interval mapping
- OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
猜你喜欢

leetcode 300. Longest increasing subsequence

Tips for using the potplayer video player

LeetCode-1189. Maximum number of balloons

The vs 2019 community version Microsoft account cannot be logged in and activated offline

张驰课堂:2022年CAQ中质协六西格玛考试时间通知

Whether the modification of basic type and reference type is valid

MLP sensor

Logistic regression model

In unity3d, billboard effect can be realized towards another target

PHP 读写 COOKIE
随机推荐
Video fire detection based on Gaussian mixture model and multi-color
PDF. JS help file
Touch screen setting for win7 system dual screen extended display
Storing texture2d to hard disk JPG file with script under unity3d
Textcnn (MR dataset - emotion classification)
Cause analysis of motion blur / smear caused by camera shooting moving objects
Redis problem (I) -- cache penetration, breakdown, avalanche
SQL injection read / write file
MNIST handwritten data recognition by CNN
Redis application (I) -- distributed lock
LeetCode-1576. Replace all question marks
Codeforces Round #793 (Div. 2) A B C
LeetCode-419. Battleship on deck
Redis distributed lock
leetcode 300. Longest increasing subsequence
Deep and detailed analysis of PHP one sentence Trojan horse
PHP 读写 COOKIE
2021 RoboCom 世界机器人开发者大赛-本科组(初赛)
Modifying theme styles in typora
LeetCode-1078. Bigram participle