当前位置:网站首页>RNN and its improved version (with 2 code cases attached)
RNN and its improved version (with 2 code cases attached)
2022-06-09 06:21:00 【GodGump】
Thank you for reading
- RNN brief introduction
- RNN Example ( The classification of names )
- Attention mechanism
- RNN Case study seq2seq English translation
- About server debugging python Explanation
RNN brief introduction
RNN(Recurrent Neural Network), It is called cyclic neural network in Chinese , It generally takes sequence data as input , Through the internal structure design of the network, the relationship characteristics between sequences are effectively captured , Generally, it is also output in the form of sequence
Tradition RNN
Internal structure process demonstration

The two black dots arrive at the blue area together ( And form a whole before )
Internal calculation formula

RNN Output

Activation function tanh

To help adjust the values flowing through the network , tanh The function compresses the value into -1 and 1 Between
Pytorch Build tradition RNN
def dm_run_for_hiddennum():
''' The first parameter :input_size( Input tensor x Dimensions ) The second parameter :hidden_size( Dimensions of hidden layers , The number of neurons in the hidden layer ) The third parameter :num_layer( Number of hidden layers ) '''
rnn = nn.RNN(5, 6, 2) # A The number of hidden layers is from 1-->2 The following procedures need to be modified ?
''' The first parameter :sequence_length( Enter the length of the sequence ) The second parameter :batch_size( Number of samples in the batch ) The third parameter :input_size( Input the dimension of tensor ) '''
input = torch.randn(1, 3, 5) # B
''' The first parameter :num_layer * num_directions( The layer number * Network direction ) The second parameter :batch_size( Number of samples in the batch ) The third parameter :hidden_size( Dimensions of hidden layers , Number of neurons in the hidden layer ) '''
h0 = torch.randn(2, 3, 6) # C
output, hn = rnn(input, h0) #
print('output-->', output.shape, output)
print('hn-->', hn.shape, hn)
print('rnn Model --->', rnn) # nn Model ---> RNN(5, 6, num_layers=11)
# Conclusion : If there is only one hidden time output The output is equal to hn
# Conclusion : If there is 2 Hidden layers ,output The output of is 2 individual ,hn Equal to the last hidden layer
Gradient calculation

LSTM Introduce
advantage :LSTM The gate structure can effectively slow down the gradient disappearance or explosion that may occur in long sequence problems , Although this phenomenon cannot be eliminated , But it performs better than traditional methods on longer sequence problems RNN.
shortcoming : Because the internal structure is relatively complex , Therefore, the training efficiency is more traditional under the same computing power RNN A lot lower
Structural analysis of forgetting door :
With the traditional RNN The internal structure of the calculation is very similar , First enter the current time step x(t) The implicit state of the previous time step h(t-1) Splicing , obtain [x(t), h(t-1)], Then make the transformation through a full connection layer , Finally through sigmoid Function is activated to get f(t), We can f(t) As gate value , Like the size and degree of opening and closing of a door , The door values will act on the tensor passing through the door , The forgetting gate value will act on the cell state of the upper layer , Represents how much information you have forgotten about the past , And because the forgetting gate value is determined by x(t), h(t-1) Calculated , Therefore, the whole formula means that according to the current time step input and the implicit state of the previous time step h(t-1) To determine how much past information carried by the cell state of the upper layer is forgotten .
Input gate structure analysis :
We see that the calculation formula of the input gate has two , The first is to generate the formula of the input gate value , It's almost the same as the forgetting gate formula , The difference is only in what they will do later . This formula means how much input information needs to be filtered . The second formula for the input gate is the same as the traditional RNN The internal structure calculation is the same . about LSTM Speaking of , What it gets is the current cell state , Not like a classic RNN The same is the implicit state .
Cell state update analysis :
The structure and calculation formula of cell renewal are very easy to understand , There is no full connection layer , Just compare the forgetting gate value just obtained with that obtained in the previous time step C(t-1) Multiply , In addition, the input gate value and the current time step are not updated C(t) The result of multiplication . Finally get the updated C(t) As part of the next time step input . The whole process of cell state update is the application of forgetting gate and input gate .
Output gate structure analysis :
The formula of the output gate part is also two , The first is to calculate the gate value of the output gate , It and forgetting the door , The input gate is calculated in the same way . The second is to use this gate value to generate an implicit state h(t), He will act on the renewed cell state C(t) On , And do tanh Activate , The resulting h(t) As part of the next time step input . The whole process of output gate , To produce an implicit state h(t).
chart

C It means a memory cell at a certain moment
h Indicates the state at a certain time
f Oblivion gate
i Update door
o Output gate
ht Flow to the next time and direct output
The gradient formula

The real life Liezi strengthens the understanding
Let's take the exam as an example : We will have the final exam soon , The first gate is advanced mathematics , The second gate line generation . Of this picture Xt It is kaoxian generation ,h(t-1) It is the state of taking the advanced mathematics test ,c(t-1) It's the memory of high numbers ,ht It includes the status of the end of the test line generation and the score , The fraction is output from above as output , Pass status to next exam ( Such as English ). Why forget ? Because not everything needs to be cared about by the thread generation , This is the function of the forgetting gate . Why keep the previous knowledge , For example, the computing power used in advanced numbers is something we need to keep .
alike , Tradition RNN Just remember everything , Efficiency will become lower .
Code example
# Definition LSTM Parameter meaning of : (input_size, hidden_size, num_layers)
# Define the parameter meaning of the input tensor : (sequence_length, batch_size, input_size)
# The parameter meanings of hidden layer initial tensor and cell initial state tensor are defined :
# (num_layers * num_directions, batch_size, hidden_size)
>>> import torch.nn as nn
>>> import torch
>>> rnn = nn.LSTM(5, 6, 2)
>>> input = torch.randn(1, 3, 5)
>>> h0 = torch.randn(2, 3, 6)
>>> c0 = torch.randn(2, 3, 6)
>>> output, (hn, cn) = rnn(input, (h0, c0))
>>> output
tensor([[[ 0.0447, -0.0335, 0.1454, 0.0438, 0.0865, 0.0416],
[ 0.0105, 0.1923, 0.5507, -0.1742, 0.1569, -0.0548],
[-0.1186, 0.1835, -0.0022, -0.1388, -0.0877, -0.4007]]],
grad_fn=<StackBackward>)
>>> hn
tensor([[[ 0.4647, -0.2364, 0.0645, -0.3996, -0.0500, -0.0152],
[ 0.3852, 0.0704, 0.2103, -0.2524, 0.0243, 0.0477],
[ 0.2571, 0.0608, 0.2322, 0.1815, -0.0513, -0.0291]],
[[ 0.0447, -0.0335, 0.1454, 0.0438, 0.0865, 0.0416],
[ 0.0105, 0.1923, 0.5507, -0.1742, 0.1569, -0.0548],
[-0.1186, 0.1835, -0.0022, -0.1388, -0.0877, -0.4007]]],
grad_fn=<StackBackward>)
>>> cn
tensor([[[ 0.8083, -0.5500, 0.1009, -0.5806, -0.0668, -0.1161],
[ 0.7438, 0.0957, 0.5509, -0.7725, 0.0824, 0.0626],
[ 0.3131, 0.0920, 0.8359, 0.9187, -0.4826, -0.0717]],
[[ 0.1240, -0.0526, 0.3035, 0.1099, 0.5915, 0.0828],
[ 0.0203, 0.8367, 0.9832, -0.4454, 0.3917, -0.1983],
[-0.2976, 0.7764, -0.0074, -0.1965, -0.1343, -0.6683]]],
grad_fn=<StackBackward>)
GRU Introduce
GRU(Gated Recurrent Unit) Also known as gated cycle unit structure , It's also traditional RNN A variation of the , Same as LSTM It can also effectively capture the semantic association between long sequences , Alleviate gradient disappearance or explosion .
chart

Individual to GRU The understanding of the
I am not talented , If there is any misunderstanding , I hope the sea culvert is right , Thank you first .
Actually , Personal feeling GRU yes LSTM Improved version , Put memory 、h The two functions have been merged , Its core is the last formula in the figure above , This determines the intensity of memory and forgetting .
I still take the above learning as an example .h(t-1) It's equivalent to everything we learned in school , Now we are going to finish the design ( Take software engineering as an example , Because I am not a software engineering major , I don't know much about other majors ).Xt That is, we should finish the design .h(t) That is, what we have learned can be provided to those who have completed the design , such as python Language 、 Software architecture 、 Introduction to software engineering, etc .rt What is it? ? It is a comparison of the things that various disciplines can provide to complete the design , such as python Language provides 70%, The software architecture provides 10%. Of course, there may be 0, For example, foreign history in elective courses .h( With wavy lines on it ) Is equivalent to something that needs to be learned , For example, when we finished the design , Want to build a reverse proxy server , The school didn't teach us , Should we learn by ourselves ?
LSTM Two incomparable places
1. Due to the reduction of parameters , Model training speed will be improved , At the same time, the interpretability is also slightly improved .
2. Due to the improvement of operation , Reduce the risk of over fitting .
RNN Example ( The classification of names )
Case introduction
Enter with a person's name , Use the model to help us determine which country it is most likely to come from , This is of great significance in the business of some international companies , During user registration , The user will be directly assigned with possible country or region options according to the name filled in by the user , And the national flag of the country or region , Limit the number of mobile phone numbers, etc .
Data set download and interpretation
stay github Upper name_decalre
Click me to download
Data format description The first word in each line is the person's name , The second word is the country name . Use tabs in the middle tab Division
Guide pack
# Import torch Tools
import torch
# Import nn Prepare to build the model
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
# Import torch Data source Data iterator toolkit
from torch.utils.data import Dataset, DataLoader
# Used to get common letters and character normalization
import string
# Import time Kit
import time
# Introduce the drawing toolkit
import matplotlib.pyplot as plt
# from io Import file opening method in
from io import open
Check the number of common characters
def data_process():
# Get all common characters, including letters and common punctuation
all_letters = string.ascii_letters + " .,;'"
# Get the number of common characters
n_letters = len(all_letters)
return n_letters
def main():
print(data_process())
return 0
if __name__ == '__main__':
main()
Build country names , And get the number of countries
def get_country():
# The name of the country Number of species
categorys = ['Italian', 'English', 'Arabic', 'Spanish', 'Scottish', 'Irish', 'Chinese', 'Vietnamese', 'Japanese',
'French', 'Greek', 'Dutch', 'Korean', 'Polish', 'Portuguese', 'Russian', 'Czech', 'German']
# The name of the country Number
categorynum = len(categorys)
return categorys,categorynum
def main():
# print(data_process())
print(get_country())
return 0
Read data to memory
def read_data(filename):
""" :param filename: :return: # Thought analysis 1 Open data file open(filename, mode='r', encoding='utf-8') 2 Read file by line 、 Sample extraction x sample y line.strip().split('\t') 3 Return samples x A list of 、 sample y A list of my_list_x, my_list_y """
my_list_x, my_list_y= [], []
# Open file
with open(filename, mode='r', encoding='utf-8') as f:
# Read data by line
for line in f.readlines():
if len(line) <= 5:
continue
# Extract samples by line x sample y
(x, y) = line.strip().split('\t')
my_list_x.append(x)
my_list_y.append(y)
# Return samples x A list of 、 sample y A list of
return my_list_x, my_list_y
Build the data source and iterate
def read_data(filename):
""" :param filename: :return: # Thought analysis 1 Open data file open(filename, mode='r', encoding='utf-8') 2 Read file by line 、 Sample extraction x sample y line.strip().split('\t') 3 Return samples x A list of 、 sample y A list of my_list_x, my_list_y """
my_list_x, my_list_y= [], []
# Open file
with open(filename, mode='r', encoding='utf-8') as f:
# Read data by line
for line in f.readlines():
if len(line) <= 5:
continue
# Extract samples by line x sample y
(x, y) = line.strip().split('\t')
my_list_x.append(x)
my_list_y.append(y)
# Return samples x A list of 、 sample y A list of
return my_list_x, my_list_y
class NameClassDataset(Dataset):
def __init__(self, my_list_x, my_list_y):
# sample x
self.my_list_x = my_list_x
# sample y
self.my_list_y = my_list_y
# Number of sample entries
self.sample_len = len(my_list_x)
# Get the number of samples
def __len__(self):
return self.sample_len
# Get the number Sample data
def __getitem__(self, index):
# Yes index Outliers are corrected [0, self.sample_len-1]
index = min(max(index, 0), self.sample_len-1)
# Get... By index Data samples x y
x = self.my_list_x[index]
y = self.my_list_y[index]
# print(x, y)
# sample x one-hot Zhang quantization
tensor_x = torch.zeros(len(x), n_letters)
# Traverse the names of people Of Every letter Make it one-hot code
for li, letter in enumerate(x):
# letter2indx Use all_letters.find(letter) Find letters in all_letters The position in the table
# to one-hot assignment
tensor_x[li][all_letters.find(letter)] = 1
# sample y Zhang quantization
tensor_y = torch.tensor(categorys.index(y), dtype=torch.long)
# Return results
return tensor_x, tensor_y
def dm_test_NameClassDataset():
# 1 get data
myfilename = '../data/name_classfication.txt'
my_list_x, my_list_y = read_data(myfilename)
# 2 Instantiation dataset object
nameclassdataset = NameClassDataset(my_list_x, my_list_y)
# 3 Instantiation dataloader
mydataloader = DataLoader(dataset=nameclassdataset, batch_size=1, shuffle=True)
for i, (x, y) in enumerate (mydataloader):
print('x.shape', x.shape, x)
print('y.shape', y.shape, y)
break
Improved handling of exception indexes
python The language supports our index , So our index should be the same , But the program can be a little more complicated , The code that you want to replace can put NameClassDataset Of __getitem__ Replace with the following code :
def __getitem__(self, index):
# Yes index Outliers are corrected [0, self.sample_len-1]
if index < 0:
index = max(-self.sample_len, index)
else:
index = min(self.sample_len-1, index)
# Get... By index Data samples x y
x = self.my_list_x[index]
y = self.my_list_y[index]
# print(x, y)
# sample x one-hot Zhang quantization
tensor_x = torch.zeros(len(x), n_letters)
# Traverse the names of people Of Every letter Make it one-hot code
for li, letter in enumerate(x):
# letter2indx Use all_letters.find(letter) Find letters in all_letters The position in the table
# to one-hot assignment
tensor_x[li][all_letters.find(letter)] = 1
# sample y Zhang quantization
tensor_y = torch.tensor(categorys.index(y), dtype=torch.long)
# Return results
return tensor_x, tensor_y
Build three RNN Model
Build tradition RNN
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size, num_layers=1):
super(RNN, self).__init__()
# 1 init function Prepare three layers self.rnn self.linear self.softmax=nn.LogSoftmax(dim=-1)
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
self.num_layers = num_layers
# Definition rnn layer
self.rnn = nn.RNN(self.input_size, self.hidden_size, self.num_layers)
# Definition linear layer ( Fully connected linear layer )
self.linear = nn.Linear(self.hidden_size, self.output_size)
# Definition softmax layer
self.softmax = nn.LogSoftmax(dim=-1)
def forward(self, input, hidden):
# Let the data go through three layers return softmax Results and hn
# Data shape [6,57] -> [6,1,57]
input = input.unsqueeze(1)
# Send the data to the model Extract the features of things
# Data shape [seqlen,1,57],[1,1,128]) -> [seqlen,1,18],[1,1,128]
rr, hn = self.rnn(input, hidden)
# Data shape [seqlen,1,128] - [1, 128]
tmprr = rr[-1]
tmprr = self.linear(tmprr)
return self.softmax(tmprr), hn
def inithidden(self):
# Initialize hidden layer input data inithidden()
return torch.zeros(self.num_layers, 1,self.hidden_size)
structure LSTM
class LSTM(nn.Module):
def __init__(self, input_size, hidden_size, output_size, num_layers=1):
super(LSTM, self).__init__()
# 1 init function Prepare three layers self.rnn self.linear self.softmax=nn.LogSoftmax(dim=-1)
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
self.num_layers = num_layers
# Definition rnn layer
self.rnn = nn.LSTM(self.input_size, self.hidden_size, self.num_layers)
# Definition linear layer
self.linear = nn.Linear(self.hidden_size, self.output_size)
# Definition softmax layer
self.softmax = nn.LogSoftmax(dim=-1)
def forward(self, input, hidden, c):
# Let the data go through three layers return softmax Results and hn c
# Data shape [6,57] -> [6,1,52]
input = input.unsqueeze(1)
# Send the data to the model Extract the features of things
# Data shape [seqlen,1,57],[1,1,128], [1,1,128]) -> [seqlen,1,18],[1,1,128],[1,1,128]
rr, (hn, c) = self.rnn(input, (hidden, c))
# Data shape [seqlen,1,128] - [1, 128]
tmprr = rr[-1]
tmprr = self.linear(tmprr)
return self.softmax(tmprr), hn, c
def inithidden(self):
# Initialize hidden layer input data inithidden()
hidden = c = torch.zeros(self.num_layers, 1, self.hidden_size)
return hidden, c
structure GRU
class GRU(nn.Module):
def __init__(self, input_size, hidden_size, output_size, num_layers=1):
super(GRU, self).__init__()
# 1 init function Prepare three layers self.rnn self.linear self.softmax=nn.LogSoftmax(dim=-1)
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
self.num_layers = num_layers
# Definition rnn layer
self.rnn = nn.GRU(self.input_size, self.hidden_size, self.num_layers)
# Definition linear layer
self.linear = nn.Linear(self.hidden_size, self.output_size)
# Definition softmax layer
self.softmax = nn.LogSoftmax(dim=-1)
def forward(self, input, hidden):
# Let the data go through three layers return softmax Results and hn
# Data shape [6,57] -> [6,1,52]
input = input.unsqueeze(1)
# Send the data to the model Extract the features of things
# Data shape [seqlen,1,57],[1,1,128]) -> [seqlen,1,18],[1,1,128]
rr, hn = self.rnn(input, hidden)
# Data shape [seqlen,1,128] - [1, 128]
tmprr = rr[-1]
tmprr = self.linear(tmprr)
# Many classification softmax Two classification sigmoid
return self.softmax(tmprr), hn
def inithidden(self):
# Initialize hidden layer input data inithidden()
return torch.zeros(self.num_layers, 1, self.hidden_size)
Test and train the three models
test
def dm_test_rnn_lstm_gru():
# one-hot Coding features 57(n_letters), It's also RNN Input dimensions for
input_size = 57
# Define the last dimension of the hidden layer
n_hidden = 128
# The output size is the total number of language categories n_categories # 1 Characters are predicted to be 18 Categories
output_size = 18
# 1 get data
myfilename = '../data/name_classfication.txt'
my_list_x, my_list_y = read_data(myfilename)
# 2 Instantiation dataset object
nameclassdataset = NameClassDataset(my_list_x, my_list_y)
# 3 Instantiation dataloader
mydataloader = DataLoader(dataset=nameclassdataset, batch_size=1, shuffle=True)
my_rnn = RNN(n_letters, n_hidden, categorynum)
my_lstm = LSTM(n_letters, n_hidden, categorynum)
my_gru = GRU(n_letters, n_hidden, categorynum)
for i, (x, y) in enumerate(mydataloader):
# print('x.shape', x.shape, x)
# print('y.shape', y.shape, y)
# Initialize a 3D hidden layer 0 tensor , It's also the initial cell state tensor
output, hidden = my_rnn(x[0], my_rnn.inithidden())
print("rnn output.shape--->:", output.shape, output)
if (i == 0):
break
for i, (x, y) in enumerate(mydataloader):
# print('x.shape', x.shape, x)
# print('y.shape', y.shape, y)
hidden, c = my_lstm.inithidden()
output, hidden, c = my_lstm(x[0], hidden, c)
print("lstm output.shape--->:", output.shape, output)
if (i == 0):
break
for i, (x, y) in enumerate(mydataloader):
# print('x.shape', x.shape, x)
# print('y.shape', y.shape, y)
output, hidden = my_gru(x[0], my_gru.inithidden())
print("gru output.shape--->:", output.shape, output)
if (i == 0):
break
Training
Tradition RNN
# Model training parameters
mylr = 1e-3
epochs = 1
def my_train_rnn():
# get data
myfilename = './data/name_classfication.txt'
my_list_x, my_list_y = read_data(myfilename)
# Instantiation dataset object
nameclassdataset = NameClassDataset(my_list_x, my_list_y)
# Instantiation Model
input_size = 57
n_hidden = 128
output_size = 18
my_rnn = RNN(input_size, n_hidden, output_size)
print('my_rnn Model --->', my_rnn)
# Instantiation Loss function adam Optimizer
mycrossentropyloss = nn.NLLLoss()
myadam = optim.Adam(my_rnn.parameters(), lr=mylr)
# Define model training parameters
starttime = time.time()
total_iter_num = 0 # Number of trained samples
total_loss = 0.0 # Trained losses and
total_loss_list = [] # Every time 100 Find the average loss for each sample Form a loss list
total_acc_num = 0 # The trained sample predicts the exact total number
total_acc_list = [] # Every time 100 One time average accuracy of samples Form a list of average accuracy
# Outer layer for loop Control the number of wheels
for epoch_idx in range(epochs):
# Instantiation dataloader
mydataloader = DataLoader(dataset=nameclassdataset, batch_size=1, shuffle=True)
# Inner layer for loop Control the number of iterations
for i, (x, y) in enumerate(mydataloader):
# Feed data to the model
output, hidden = my_rnn(x[0], my_rnn.inithidden())
# Calculate the loss
myloss = mycrossentropyloss(output, y)
# Gradient clear
myadam.zero_grad()
# Back propagation
myloss.backward()
# Gradient update
myadam.step()
# Calculate the total loss
total_iter_num = total_iter_num + 1
total_loss = total_loss + myloss.item()
# Calculate the total accuracy
i_predit_tag = (1 if torch.argmax(output).item() == y.item() else 0)
total_acc_num = total_acc_num + i_predit_tag
# Every time 100 Time training Find an average loss Average accuracy
if (total_iter_num % 100 == 0):
tmploss = total_loss/total_iter_num
total_loss_list.append(tmploss)
tmpacc = total_acc_num/total_iter_num
total_acc_list.append(tmpacc)
# Every time 2000 Time training Print log
if (total_iter_num % 2000 == 0):
tmploss = total_loss / total_iter_num
print(' Rounds :%d, Loss :%.6f, Time :%d, Accuracy rate :%.3f' %(epoch_idx+1, tmploss, time.time() - starttime, tmpacc))
# Save the model for each round
torch.save(my_rnn.state_dict(), './my_rnn_model_%d.bin' % (epoch_idx + 1))
# Calculate the total time
total_time = int(time.time() - starttime)
return total_loss_list, total_time, total_acc_list
LSTM
def my_train_lstm():
# get data
myfilename = './data/name_classfication.txt'
my_list_x, my_list_y = read_data(myfilename)
# Instantiation dataset object
nameclassdataset = NameClassDataset(my_list_x, my_list_y)
# Instantiation Model
input_size = 57
n_hidden = 128
output_size = 18
my_lstm = LSTM(input_size, n_hidden, output_size)
print('my_lstm Model --->', my_lstm)
# Instantiation Loss function adam Optimizer
mycrossentropyloss = nn.NLLLoss()
myadam = optim.Adam(my_lstm.parameters(), lr=mylr)
# Define model training parameters
starttime = time.time()
total_iter_num = 0 # Number of trained samples
total_loss = 0.0 # Trained losses and
total_loss_list = [] # Every time 100 Find the average loss for each sample Form a loss list
total_acc_num = 0 # The trained sample predicts the exact total number
total_acc_list = [] # Every time 100 One time average accuracy of samples Form a list of average accuracy
# Outer layer for loop Control the number of wheels
for epoch_idx in range(epochs):
# Instantiation dataloader
mydataloader = DataLoader(dataset=nameclassdataset, batch_size=1, shuffle=True)
# Inner layer for loop Control the number of iterations
for i, (x, y) in enumerate(mydataloader):
# Feed data to the model
hidden, c = my_lstm.inithidden()
output, hidden, c = my_lstm(x[0], hidden, c)
# Calculate the loss
myloss = mycrossentropyloss(output, y)
# Gradient clear
myadam.zero_grad()
# Back propagation
myloss.backward()
# Gradient update
myadam.step()
# Calculate the total loss
total_iter_num = total_iter_num + 1
total_loss = total_loss + myloss.item()
# Calculate the total accuracy
i_predit_tag = (1 if torch.argmax(output).item() == y.item() else 0)
total_acc_num = total_acc_num + i_predit_tag
# Every time 100 Time training Find an average loss Average accuracy
if (total_iter_num % 100 == 0):
tmploss = total_loss/total_iter_num
total_loss_list.append(tmploss)
tmpacc = total_acc_num/total_iter_num
total_acc_list.append(tmpacc)
# Every time 2000 Time training Print log
if (total_iter_num % 2000 == 0):
tmploss = total_loss / total_iter_num
print(' Rounds :%d, Loss :%.6f, Time :%d, Accuracy rate :%.3f' %(epoch_idx+1, tmploss, time.time() - starttime, tmpacc))
# Save the model for each round
torch.save(my_lstm.state_dict(), './my_lstm_model_%d.bin' % (epoch_idx + 1))
# Calculate the total time
total_time = int(time.time() - starttime)
return total_loss_list, total_time, total_acc_list
GRU
def my_train_gru():
# get data
myfilename = './data/name_classfication.txt'
my_list_x, my_list_y = read_data(myfilename)
# Instantiation dataset object
nameclassdataset = NameClassDataset(my_list_x, my_list_y)
# Instantiation Model
input_size = 57
n_hidden = 128
output_size = 18
my_gru = GRU(input_size, n_hidden, output_size)
print('my_gru Model --->', my_gru)
# Instantiation Loss function adam Optimizer
mycrossentropyloss = nn.NLLLoss()
myadam = optim.Adam(my_gru.parameters(), lr=mylr)
# Define model training parameters
starttime = time.time()
total_iter_num = 0 # Number of trained samples
total_loss = 0.0 # Trained losses and
total_loss_list = [] # Every time 100 Find the average loss for each sample Form a loss list
total_acc_num = 0 # The trained sample predicts the exact total number
total_acc_list = [] # Every time 100 One time average accuracy of samples Form a list of average accuracy
# Outer layer for loop Control the number of wheels
for epoch_idx in range(epochs):
# Instantiation dataloader
mydataloader = DataLoader(dataset=nameclassdataset, batch_size=1, shuffle=True)
# Inner layer for loop Control the number of iterations
for i, (x, y) in enumerate(mydataloader):
# Feed data to the model
output, hidden = my_gru(x[0], my_gru.inithidden())
# Calculate the loss
myloss = mycrossentropyloss(output, y)
# Gradient clear
myadam.zero_grad()
# Back propagation
myloss.backward()
# Gradient update
myadam.step()
# Calculate the total loss
total_iter_num = total_iter_num + 1
total_loss = total_loss + myloss.item()
# Calculate the total accuracy
i_predit_tag = (1 if torch.argmax(output).item() == y.item() else 0)
total_acc_num = total_acc_num + i_predit_tag
# Every time 100 Time training Find an average loss Average accuracy
if (total_iter_num % 100 == 0):
tmploss = total_loss/total_iter_num
total_loss_list.append(tmploss)
tmpacc = total_acc_num/total_iter_num
total_acc_list.append(tmpacc)
# Every time 2000 Time training Print log
if (total_iter_num % 2000 == 0):
tmploss = total_loss / total_iter_num
print(' Rounds :%d, Loss :%.6f, Time :%d, Accuracy rate :%.3f' %(epoch_idx+1, tmploss, time.time() - starttime, tmpacc))
# Save the model for each round
torch.save(my_gru.state_dict(), './my_gru_model_%d.bin' % (epoch_idx + 1))
# Calculate the total time
total_time = int(time.time() - starttime)
return total_loss_list, total_time, total_acc_list
To make predictions
Build a prediction function
# 1 Build tradition RNN Prediction function
my_path_rnn = './model/my_rnn_model_1.bin'
my_path_lstm = './model/my_lstm_model_1.bin'
my_path_gru = './model/my_gru_model_1.bin'
# Translate a person's name into onehot tensor
# eg 'bai' --> [3,57]
def lineToTensor(x):
# Text sheet quantization x
tensor_x = torch.zeros(len(x), n_letters)
# Traverse each character index and character in this person's name
for li, letter in enumerate(x):
# letter In string all_letters Position in Namely onehot tensor 1 The location of the index
# letter In string all_letters Position in Use string find() Method to get
tensor_x[li][all_letters.find(letter)] = 1
return tensor_x
Tradition RNN
# Thought analysis
# 1 Enter text data Zhang quantization one-hot
# 2 Instantiation model Load trained model parameters m.load_state_dict(torch.load(my_path_rnn))
# 3 Model to predict with torch.no_grad()
# 4 Before extracting from the forecast results 3 name , Display print results output.topk(3, 1, True)
# category_idx = topi[0][i] category = categorys[category_idx]
# structure rnn Prediction function
def my_predict_rnn(x):
n_letters = 57
n_hidden = 128
n_categories = 18
# Input text , Zhang quantization one-hot
x_tensor = lineToTensor(x)
# Instantiation model Load trained model parameters
my_rnn = RNN(n_letters, n_hidden, n_categories)
my_rnn.load_state_dict(torch.load(my_path_rnn))
with torch.no_grad():
# Model to predict
output, hidden = my_rnn(x_tensor, my_rnn.inithidden())
# Before extracting from the forecast results 3 name
# 3 Before taking 3 name , 1 Represents the dimension to sort , True Indicates whether to return the largest or lowest element
topv, topi = output.topk(3, 1, True)
print('rnn =>', x)
for i in range(3):
value = topv[0][i]
category_idx = topi[0][i]
category = categorys[category_idx]
print('\t value:%d category:%s' %(value, category))
LSTM
# structure LSTM Prediction function
def my_predict_lstm(x):
n_letters = 57
n_hidden = 128
n_categories = 18
# Input text , Zhang quantization one-hot
x_tensor = lineToTensor(x)
# Instantiation model Load trained model parameters
my_lstm = LSTM(n_letters, n_hidden, n_categories)
my_lstm.load_state_dict(torch.load(my_path_lstm))
with torch.no_grad():
# Model to predict
hidden, c = my_lstm.inithidden()
output, hidden, c = my_lstm(x_tensor, hidden, c)
# Before extracting from the forecast results 3 name
# 3 Before taking 3 name , 1 Represents the dimension to sort , True Indicates whether to return the largest or lowest element
topv, topi = output.topk(3, 1, True)
print('rnn =>', x)
for i in range(3):
value = topv[0][i]
category_idx = topi[0][i]
category = categorys[category_idx]
print('\t value:%d category:%s' % (value, category))
print('\t value:%d category:%s' % (value, category))
GRU
# structure GRU Prediction function
def my_predict_gru(x):
n_letters = 57
n_hidden = 128
n_categories = 18
# Input text , Zhang quantization one-hot
x_tensor = lineToTensor(x)
# Instantiation model Load trained model parameters
my_gru = GRU(n_letters, n_hidden, n_categories)
my_gru.load_state_dict(torch.load(my_path_gru))
with torch.no_grad():
# Model to predict
output, hidden = my_gru(x_tensor, my_gru.inithidden())
# Before extracting from the forecast results 3 name
# 3 Before taking 3 name , 1 Represents the dimension to sort , True Indicates whether to return the largest or lowest element
topv, topi = output.topk(3, 1, True)
print('rnn =>', x)
for i in range(3):
value = topv[0][i]
category_idx = topi[0][i]
category = categorys[category_idx]
print('\t value:%d category:%s' % (value, category))
call
def dm_test_predic_rnn_lstm_gru():
# Put the entry addresses of the three functions Make up a list , Unified input data for testing
for func in [my_predict_rnn, my_predict_lstm, my_predict_gru]:
func('zhang')
Attention mechanism
Introduction to attention mechanism
The concept of attention
When we look at things , The reason why we can judge a thing quickly ( Of course, allowing judgment is wrong ), Because our brain can quickly focus on the most recognizable part of things and make judgments , Instead of looking at things from beginning to end , To have a judgment . Based on this theory , There is an attention mechanism .
Attention calculation rules
It requires three specified inputs Q(query), K(key), V(value), Then we get the result of attention through the calculation formula , This result represents query stay key and value Attention under action indicates . When you type Q=K=V when , It's called the self attention calculation rule .
effect
The attention mechanism at the decoder : The output result of the focusing encoder can be effectively according to the model target , When it is used as the input of the decoder, the effect is improved . Improving the previous encoder output is a single certain length tensor , Unable to store too much information .
The attention mechanism at the encoder side : It mainly solves the problem of representation , Equivalent to feature extraction process , Getting input indicates . Generally use self attention (self-attention).
Life scenes help understand
When we do English reading comprehension ,4 The only options are 1 One is right , The rest are interference items . The attention mechanism is to focus on the acquisition of the right options .
bmm Introduction to operations
# If parameters 1 The shape is (b × n × m), Parameters 2 The shape is (b × m × p), The output of (b × n × p)
>>> input = torch.randn(10, 3, 4)
>>> mat2 = torch.randn(10, 4, 5)
>>> res = torch.bmm(input, mat2)
>>> res.size()
torch.Size([10, 3, 5])
Code implementation
import torch
import torch.nn as nn
import torch.nn.functional as F
# MyAtt Class implementation thinking analysis
# 1 init function (self, query_size, key_size, value_size1, value_size2, output_size)
# Get ready 2 A linear layer Attention weight distribution self.attn The attention result indicates that the output layer is carried out according to the specified dimension self.attn_combine
# 2 forward(self, Q, K, V):
# Find the query tensor q Attention weight distribution , attn_weights[1,32]
# Find the query tensor q The results of attention show that bmm operation , attn_applied[1,1,64]
# q And attn_applied The fusion , Then output according to the specified dimension output[1,1,32]
# The return attention result indicates output:[1,1,32], Attention weight distribution attn_weights:[1,32]
class MyAtt(nn.Module):
# 32 32 32 64 32
def __init__(self, query_size, key_size, value_size1, value_size2, output_size):
super(MyAtt, self).__init__()
self.query_size = query_size
self.key_size = key_size
self.value_size1 = value_size1
self.value_size2 = value_size2
self.output_size = output_size
# Linear layer 1 Attention weight distribution
self.attn = nn.Linear(self.query_size + self.key_size, self.value_size1)
# Linear layer 2 The attention result indicates that the layer is output according to the specified dimension self.attn_combine
self.attn_combine = nn.Linear(self.query_size+self.value_size2, output_size)
def forward(self, Q, K, V):
# 1 Find the query tensor q Attention weight distribution , attn_weights[1,32]
# [1,1,32],[1,1,32]--> [1,32],[1,32]->[1,64]
# [1,64] --> [1,32]
# tmp1 = torch.cat( (Q[0], K[0]), dim=1)
# tmp2 = self.attn(tmp1)
# tmp3 = F.softmax(tmp2, dim=1)
attn_weights = F.softmax( self.attn(torch.cat( (Q[0], K[0]), dim=1)), dim=1)
# 2 Find the query tensor q The results show that bmm operation , attn_applied[1,1,64]
# [1,1,32] * [1,32,64] ---> [1,1,64]
attn_applied = torch.bmm(attn_weights.unsqueeze(0), V)
# 3 q And attn_applied The fusion , Then output according to the specified dimension output[1,1,64]
# 3-1 q And the result indicates splicing [1,32],[1,64] ---> [1,96]
output = torch.cat((Q[0], attn_applied[0]), dim=1)
# 3-2 shape [1,96] ---> [1,32]
output = self.attn_combine(output).unsqueeze(0)
# 4 The return attention result indicates output:[1,1,32], Attention weight distribution attn_weights:[1,32]
return output, attn_weights
if __name__ == '__main__':
# Why introduce attention mechanisms :
# rnn A series of recurrent neural networks , As time steps grow , Will forget the characteristics of the previous words , Resulting in inadequate extraction of sentence features
# rnn Serial recurrent neural network is a time step-by-step method to extract sentence features , inefficiency
# Can it be right? 32 Extract the features of things at the same time , And it is parallel , This is the attention mechanism !
# Task description
# v Is the content such as 32 One word, one word 64 Features ,k yes 32 An index of words ,q Is the query tensor
# Our mission : Enter the query tensor q, The following information is calculated through the attention mechanism :
# 1、 Query tensor q Attention weight distribution : Query tensor q And others 32 Word correlation ( Acquaintance degree )
# 2、 Query tensor q The results show that : There is an ordinary q Upgrade to a more powerful q; use q and v do bmm operation
# 3 Be careful : Query tensor q Who is the target of the query , Is whose query tensor .
# eg: For example, query tensor q The future is to look up words " I ", be q Namely " I " The query tensor of
query_size = 32
key_size = 32
value_size1 = 32 # 32 Word
value_size2 = 64 # 64 Features
output_size = 32
Q = torch.randn(1, 1, 32)
K = torch.randn(1, 1, 32)
V = torch.randn(1, 32, 64)
# V = torch.randn(1, value_size1, value_size2)
# 1 Instantiate the attention class object
myattobj = MyAtt(query_size, key_size, value_size1, value_size2, output_size)
# 2 hold QKV Data is thrown to the attention mechanism , Find the query tensor q The results of attention show that 、 Attention weight distribution
(output, attn_weights) = myattobj(Q, K, V)
print(' Query tensor q The results of attention show that output--->', output.shape, output)
print(' Query tensor q Attention weight distribution attn_weights--->', attn_weights.shape, attn_weights)
RNN Case study seq2seq English translation
seq2seq Introduce
seq2seq Model architecture

Model explanation
seq2seq The model architecture consists of three parts , Namely encoder( Encoder )、decoder( decoder )、 Intermediate semantic tensor c. The internal implementation of encoder and decoder uses GRU Model
The picture shows a Chinese to English translation : welcome Come on Beijing → welcome to BeiJing. The encoder first processes Chinese input " welcome Come on Beijing ", adopt GRU The model obtains the output tensor of each time step , Finally, they are spliced into an intermediate semantic tensor c; The decoder will then use this intermediate semantic tensor c And the hidden layer tensor of each time step , Generate the corresponding translation languages one by one
Dataset Download
Guide the package and clean the files
# For regular expressions
import re
# For building network structures and functions torch tool kit
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
# torch Predefined optimization method toolkit in
import torch.optim as optim
import time
# Used to randomly generate data
import random
import matplotlib.pyplot as plt
# Device selection , We can choose in cuda perhaps cpu Run your code on
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Start flag
SOS_token = 0
# End mark
EOS_token = 1
# The maximum sentence length cannot exceed 10 individual ( Include punctuation )
MAX_LENGTH = 10
# Data file path
data_path = '../data/eng-fra-v2.txt'
# Text cleaning tool function
def normalizeString(s):
""" String normalization function , Parameters s Represents the incoming string """
s = s.lower().strip()
# stay .!? Add a space before there \1 For the first group In regularization \num
s = re.sub(r"([.!?])", r" \1", s)
# s = re.sub(r"([.!?])", r" ", s)
# Use regular expressions to string No Replace upper and lower case letters and normal punctuation with spaces
s = re.sub(r"[^a-zA-Z.!?]+", r" ", s)
return s
Thought analysis
my_getdata() Analysis of the idea of cleaning text and building Dictionary
1 Read file by line open().read().strip().split(\n) my_lines
2 Clean text by line Build language pairs my_pairs
2-1 Format [[' english ', ' French '], [' english ', ' French '], [' english ', ' French '], [' english ', ' French ']....]
2-2 Call the clean text tool function normalizeString(s)
3 Traverse language pairs Build a dictionary of English words A dictionary of French words
3-1 english_word2index english_word_n french_word2index french_word_n
among english_word2index = {
0: "SOS", 1: "EOS"} english_word_n=2
3-2 english_index2word french_index2word
4 Return the 7 results
english_word2index, english_index2word, english_word_n,
french_word2index, french_index2word, french_word_n, my_pairs
Data preprocessing
def my_getdata():
# 1 Read file by line open().read().strip().split(\n)
my_lines = open(data_path, encoding='utf-8').read().strip().split('\n')
# 2 Clean text by line Build language pairs my_pairs
my_pairs = [[normalizeString(s) for s in l.split('\t')] for l in my_lines]
# 3 Traverse language pairs Build a dictionary of English words A dictionary of French words
# 3-1 english_word2index english_word_n french_word2index french_word_n
english_word2index = {
0: "SOS", 1: "EOS"}
english_word_n = 2
french_word2index = {
0: "SOS", 1: "EOS"}
french_word_n = 2
# Traverse language pairs Get the English word dictionary A dictionary of French words
for pair in my_pairs:
for word in pair[0].split(' '):
if word not in english_word2index:
english_word2index[word] = english_word_n
# print(english_word2index)
english_word_n += 1
for word in pair[1].split(' '):
if word not in french_word2index:
french_word2index[word] = french_word_n
french_word_n += 1
# 3-2 english_index2word french_index2word
english_index2word = {
v:k for k, v in english_word2index.items()}
french_index2word = {
v:k for k, v in french_word2index.items()}
return english_word2index, english_index2word, english_word_n,\
french_word2index, french_index2word, french_word_n, my_pairs
def main():
# Global function Get the English word dictionary A dictionary of French words List of language pairs my_pairs
english_word2index, english_index2word, english_word_n, \
french_word2index, french_index2word, french_word_n, \
my_pairs = my_getdata()
return 0
Build the data source object and test
class MyPairsDataset(Dataset):
def __init__(self, my_pairs):
# sample x
self.my_pairs = my_pairs
# Number of sample entries
self.sample_len = len(my_pairs)
# Get the number of samples
def __len__(self):
return self.sample_len
# Get the number Sample data
def __getitem__(self, index):
# Yes index Outliers are corrected [0, self.sample_len-1]
if index < 0:
index = max(-self.sample_len, index)
else:
index = min(self.sample_len - 1, index)
# Get... By index Data samples x y
x = self.my_pairs[index][0]
y = self.my_pairs[index][1]
# sample x Text digitization
x = [english_word2index[word] for word in x.split(' ')]
# print(x)
x.append(EOS_token)
# print("x2: ", x)
tensor_x = torch.tensor(x, dtype=torch.long, device=device)
# sample y Text digitization
y = [french_word2index[word] for word in y.split(' ')]
y.append(EOS_token)
tensor_y = torch.tensor(y, dtype=torch.long, device=device)
# Be careful tensor_x tensor_y They're all one-dimensional arrays , adopt DataLoader The data is two-dimensional data
# print('tensor_y.shape===>', tensor_y.shape, tensor_y)
# Return results
return tensor_x, tensor_y
def dm_test_MyPairsDataset(my_pairs):
# 1 Instantiation dataset object
mypairsdataset = MyPairsDataset(my_pairs)
# 2 Instantiation dataloader
mydataloader = DataLoader(dataset=mypairsdataset, batch_size=1, shuffle=True)
for i, (x, y) in enumerate(mydataloader):
print('x.shape', x.shape, x)
print('y.shape', y.shape, y)
if i == 1:
break
# Global function Get the English word dictionary A dictionary of French words List of language pairs my_pairs
english_word2index, english_index2word, english_word_n, \
french_word2index, french_index2word, french_word_n, \
my_pairs = my_getdata()
def main():
dm_test_MyPairsDataset(my_pairs)
return 0
Encoder and decoder
Build on GRU The encoder
Thought analysis
EncoderRNN class Analysis of implementation ideas :
1 init function Definition 2 Layers self.embedding self.gru (batch_first=True)
def __init__(self, input_size, hidden_size): # 2803 256
2 forward(input, hidden) function , return output, hidden
The data goes through the word embedding layer Data shape [1,6] --> [1,6,256]
Data after gru layer Change in shape gru([1,6,256],[1,1,256]) --> [1,6,256] [1,1,256]
3 Initialize hidden layer input data inithidden()
shape torch.zeros(1, 1, self.hidden_size, device=device)
Code implementation
class EncoderRNN(nn.Module):
def __init__(self, input_size, hidden_size):
# input_size Encoder Number of words in word embedding layer eg:2803
# hidden_size Encoder The number of features of each word in the word embedding layer eg 256
super(EncoderRNN, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
# Instantiation nn.Embedding layer
self.embedding = nn.Embedding(input_size, hidden_size)
# Instantiation nn.GRU layer Pay attention to the parameters batch_first=True
self.gru = nn.GRU(hidden_size, hidden_size, batch_first=True)
def forward(self, input, hidden):
# The data goes through the word embedding layer Data shape [1,6] --> [1,6,256]
output = self.embedding(input)
# Data after gru layer Data shape gru([1,6,256],[1,1,256]) --> [1,6,256] [1,1,256]
output, hidden = self.gru(output, hidden)
return output, hidden
def inithidden(self):
# Initialize the hidden layer tensor to 1x1xself.hidden_size The tensor of size
return torch.zeros(1, 1, self.hidden_size, device=device)
be based on GRU and Attention The decoder of
chart

Code implementation
class AttnDecoderRNN(nn.Module):
def __init__(self, output_size, hidden_size, dropout_p=0.1, max_length=MAX_LENGTH):
# output_size Encoder Number of words in word embedding layer eg:4345
# hidden_size Encoder The number of features of each word in the word embedding layer eg 256
# dropout_p Zero setting ratio , Default 0.1,
# max_length Maximum length 10
super(AttnDecoderRNN, self).__init__()
self.output_size = output_size
self.hidden_size = hidden_size
self.dropout_p = dropout_p
self.max_length = max_length
# Definition nn.Embedding layer nn.Embedding(4345,256)
self.embedding = nn.Embedding(self.output_size, self.hidden_size)
# Define the linear layer 1: seek q Attention weight distribution
self.attn = nn.Linear(self.hidden_size * 2, self.max_length)
# Define the linear layer 2:q+ Attention results indicate that after fusion , Output according to the specified dimension
self.attn_combine = nn.Linear(self.hidden_size * 2, self.hidden_size)
# Definition dropout layer
self.dropout = nn.Dropout(self.dropout_p)
# Definition gru layer
self.gru = nn.GRU(self.hidden_size, self.hidden_size, batch_first=True)
# Definition out layer The decoder outputs by category (256,4345)
self.out = nn.Linear(self.hidden_size, self.output_size)
# Instantiation softomax layer Normalization of values For classification
self.softmax = nn.LogSoftmax(dim=-1)
def forward(self, input, hidden, encoder_outputs):
# input representative q [1,1] Two dimensional data hidden representative k [1,1,256] encoder_outputs representative v [10,256]
# The data goes through the word embedding layer
# Data shape [1,1] --> [1,1,256]
embedded = self.embedding(input)
# Use dropout Random discard , Prevent over fitting
embedded = self.dropout(embedded)
# 1 Find the query tensor q Attention weight distribution , attn_weights[1,10]
attn_weights = F.softmax(
self.attn(torch.cat((embedded[0], hidden[0]), 1)), dim=1)
# 2 Find the query tensor q The results of attention show that bmm operation , attn_applied[1,1,256]
# [1,1,10],[1,10,256] ---> [1,1,256]
attn_applied = torch.bmm(attn_weights.unsqueeze(0), encoder_outputs.unsqueeze(0))
# 3 q And attn_applied The fusion , Then output according to the specified dimension output[1,1,256]
output = torch.cat((embedded[0], attn_applied[0]), 1)
output = self.attn_combine(output).unsqueeze(0)
# Query tensor q The results of attention show that Use relu Activate
output = F.relu(output)
# The query tensor passes through gru、softmax Output classification results
# Data shape [1,1,256],[1,1,256] --> [1,1,256], [1,1,256]
output, hidden = self.gru(output, hidden)
# Data shape [1,1,256]->[1,256]->[1,4345]
output = self.softmax(self.out(output[0]))
# Returns the decoder classification output[1,4345], Finally, the hidden layer tensor hidden[1,1,256] Attention weight tensor attn_weights[1,10]
return output, hidden, attn_weights
def inithidden(self):
# Initialize the hidden layer tensor to 1x1xself.hidden_size The tensor of size
return torch.zeros(1, 1, self.hidden_size, device=device)
Training models
teacher_forcing
teacher_forcing Is a training technique for sequence generation tasks , stay seq2seq Architecture , According to the theory of cyclic neural network , The decoder should use the result of the previous step as part of the input each time , But during training , Once the result of the previous step is wrong , This will cause such errors to accumulate , Unable to achieve training effect , therefore , We need a mechanism to change the error in the previous step , Thus born teacher_forcing
Internal iterative training function
Set parameters
# Model training parameters
mylr = 1e-4
epochs = 2
# Set up teacher_forcing The ratio is 0.5
teacher_forcing_ratio = 0.5
print_interval_num = 1000
plot_interval_num = 100
Code implementation
def Train_Iters(x, y, my_encoderrnn, my_attndecoderrnn, myadam_encode, myadam_decode, mycrossentropyloss):
# 1 code encode_output, encode_hidden = my_encoderrnn(x, encode_hidden)
encode_hidden = my_encoderrnn.inithidden()
encode_output, encode_hidden = my_encoderrnn(x, encode_hidden) # Send data at one time
# [1,6],[1,1,256] --> [1,6,256],[1,1,256]
# 2 Decoding parameter preparation and decoding
# Decoding parameters 1 encode_output_c [10,256]
encode_output_c = torch.zeros(MAX_LENGTH, my_encoderrnn.hidden_size, device=device)
for idx in range(x.shape[1]):
encode_output_c[idx] = encode_output_c[0, idx]
# Decoding parameters 2
decode_hidden = encode_hidden
# Decoding parameters 3
input_y = torch.tensor([[SOS_token]], device=device)
myloss = 0.0
y_len = y.shape[1]
use_teacher_forcing = True if random.random() < teacher_forcing_ratio else False
if use_teacher_forcing:
for idx in range(y_len):
# Data shape data shape [1,1],[1,1,256],[10,256] ---> [1,4345],[1,1,256],[1,10]
output_y, decode_hidden, attn_weight = my_attndecoderrnn(input_y, decode_hidden, encode_output_c)
target_y = y[0][idx].view(1)
myloss = myloss + mycrossentropyloss(output_y, target_y)
input_y = y[0][idx].view(1, -1)
else:
for idx in range(y_len):
# Data shape data shape [1,1],[1,1,256],[10,256] ---> [1,4345],[1,1,256],[1,10]
output_y, decode_hidden, attn_weight = my_attndecoderrnn(input_y, decode_hidden, encode_output_c)
target_y = y[0][idx].view(1)
myloss = myloss + mycrossentropyloss(output_y, target_y)
topv, topi = output_y.topk(1)
if topi.squeeze().item() == EOS_token:
break
input_y = topi.detach()
# Gradient clear
myadam_encode.zero_grad()
myadam_decode.zero_grad()
# Back propagation
myloss.backward()
# Gradient update
myadam_encode.step()
myadam_decode.step()
# return List of losses myloss.item()/y_len
return myloss.item() / y_len
Training
def Train_seq2seq():
# Instantiation mypairsdataset object Instantiation mydataloader
mypairsdataset = MyPairsDataset(my_pairs)
mydataloader = DataLoader(dataset=mypairsdataset, batch_size=1, shuffle=True)
# Instantiate the encoder my_encoderrnn Instantiate the decoder my_attndecoderrnn
my_encoderrnn = EncoderRNN(2803, 256)
my_attndecoderrnn = AttnDecoderRNN(output_size=4345, hidden_size=256, dropout_p=0.1, max_length=10)
# Instantiate the encoder optimizer myadam_encode Instantiate the decoder optimizer myadam_decode
myadam_encode = optim.Adam(my_encoderrnn.parameters(), lr=mylr)
myadam_decode = optim.Adam(my_attndecoderrnn.parameters(), lr=mylr)
# Instantiate the loss function mycrossentropyloss = nn.NLLLoss()
mycrossentropyloss = nn.NLLLoss()
# Define parameters for model training
plot_loss_list = []
# Outer layer for loop Control the number of wheels for epoch_idx in range(1, 1+epochs):
for epoch_idx in range(1, 1+epochs):
print_loss_total, plot_loss_total = 0.0, 0.0
starttime = time.time()
# Inner layer for loop Control the number of iterations
for item, (x, y) in enumerate(mydataloader, start=1):
# Call the internal training function
myloss = Train_Iters(x, y, my_encoderrnn, my_attndecoderrnn, myadam_encode, myadam_decode, mycrossentropyloss)
print_loss_total += myloss
plot_loss_total += myloss
# Calculate the print screen interval loss - every other 1000 Time
if item % print_interval_num ==0 :
print_loss_avg = print_loss_total / print_interval_num
# Impute the total loss to 0
print_loss_total = 0
# Print log , The log contents are : Training time consuming , Current iteration step , Current progress percentage , Current average loss
print(' Rounds %d Loss %.6f Time :%d' % (epoch_idx, print_loss_avg, time.time() - starttime))
# Calculate the drawing interval loss - every other 100 Time
if item % plot_interval_num == 0:
# The average loss is obtained by dividing the total loss by the interval
plot_loss_avg = plot_loss_total / plot_interval_num
# Add the average loss to plot_loss_list In the list
plot_loss_list.append(plot_loss_avg)
# Total loss attribution 0
plot_loss_total = 0
# Save the model for each round
torch.save(my_encoderrnn.state_dict(), './my_encoderrnn_%d.pth' % epoch_idx)
torch.save(my_attndecoderrnn.state_dict(), './my_attndecoderrnn_%d.pth' % epoch_idx)
# All rounds of training are completed Draw a loss chart
plt.figure()
plt.plot(plot_loss_list)
plt.savefig('./s2sq_loss.png')
plt.show()
return plot_loss_list
Model evaluation and testing
Evaluation function writing
def Seq2Seq_Evaluate(x, my_encoderrnn, my_attndecoderrnn):
""" The model evaluation code is similar to the model prediction code , Attention to use with torch.no_grad() Model prediction , The first time step uses SOS_token As input The subsequent time steps use the predicted value as input , That is, the autoregressive mechanism """"""
with torch.no_grad():
# 1 code : Send data once
encode_hidden = my_encoderrnn.inithidden()
encode_output, encode_hidden = my_encoderrnn(x, encode_hidden)
# 2 Decoding parameter preparation
# Decoding parameters 1 Fixed length intermediate semantic tensor c
encoder_outputs_c = torch.zeros(MAX_LENGTH, my_encoderrnn.hidden_size, device=device)
x_len = x.shape[1]
for idx in range(x_len):
encoder_outputs_c[idx] = encode_output[0, idx]
# Decoding parameters 2 Last 1 The output of a hidden layer As The second part of the decoder 1 Time step hidden layer input
decode_hidden = encode_hidden
# Decoding parameters 3 The start character of the first time step of the decoder
input_y = torch.tensor([[SOS_token]], device=device)
# 3 Autoregressive decoding
# Initialize predicted vocabulary list
decoded_words = []
# initialization attention tensor
decoder_attentions = torch.zeros(MAX_LENGTH, MAX_LENGTH)
for idx in range(MAX_LENGTH): # note:MAX_LENGTH=10
output_y, decode_hidden, attn_weights = my_attndecoderrnn(input_y, decode_hidden, encoder_outputs_c)
# The predicted value is used as the input value for the next time step
topv, topi = output_y.topk(1)
decoder_attentions[idx] = attn_weights
# If the output value is a Terminator , Then the cycle stops
if topi.squeeze().item() == EOS_token:
decoded_words.append('<EOS>')
break
else:
decoded_words.append(french_index2word[topi.item()])
# Assign the index of this prediction to input_y, Make the next time step prediction
input_y = topi.detach()
# Return results decoded_words, Attention tensor weight distribution table ( Cut off the unused parts )
return decoded_words, decoder_attentions[:idx + 1]
assessment
# Load model
PATH1 = './gpumodel/my_encoderrnn.pth'
PATH2 = './gpumodel/my_attndecoderrnn.pth'
def dm_test_Seq2Seq_Evaluate():
# Instantiation dataset object
mypairsdataset = MyPairsDataset(my_pairs)
# Instantiation dataloader
mydataloader = DataLoader(dataset=mypairsdataset, batch_size=1, shuffle=True)
# Instantiation model
input_size = english_word_n
hidden_size = 256 # Observation data You can use 8
my_encoderrnn = EncoderRNN(input_size, hidden_size)
# my_encoderrnn.load_state_dict(torch.load(PATH1))
my_encoderrnn.load_state_dict(torch.load(PATH1, map_location=lambda storage, loc: storage), False)
print('my_encoderrnn Model structure --->', my_encoderrnn)
# Instantiation model
input_size = french_word_n
hidden_size = 256 # Observation data You can use 8
my_attndecoderrnn = AttnDecoderRNN(input_size, hidden_size)
# my_attndecoderrnn.load_state_dict(torch.load(PATH2))
my_attndecoderrnn.load_state_dict(torch.load(PATH2, map_location=lambda storage, loc: storage), False)
print('my_decoderrnn Model structure --->', my_attndecoderrnn)
my_samplepairs = [['i m impressed with your french .', 'je suis impressionne par votre francais .'],
['i m more than a friend .', 'je suis plus qu une amie .'],
['she is beautiful like her mother .', 'vous gagnez n est ce pas ?']]
print('my_samplepairs--->', len(my_samplepairs))
for index, pair in enumerate(my_samplepairs):
x = pair[0]
y = pair[1]
# sample x Text digitization
tmpx = [english_word2index[word] for word in x.split(' ')]
tmpx.append(EOS_token)
tensor_x = torch.tensor(tmpx, dtype=torch.long, device=device).view(1, -1)
# Model to predict
decoded_words, attentions = Seq2Seq_Evaluate(tensor_x, my_encoderrnn, my_attndecoderrnn)
# print('decoded_words->', decoded_words)
output_sentence = ' '.join(decoded_words)
print('\n')
print('>', x)
print('=', y)
print('<', output_sentence)
About server debugging python Explanation
Take the code as an example (py File for " translate .py"), Don't spray , I thought that someone might have a problem naming things in Chinese , I hereby try , Solve the problem by the way . Do not program in Chinese ! Do not program in Chinese ! Do not program in Chinese !
Knowledge points to know before operation
nohup
nohup The command is run by Command Parameters and any related Arg The command specified by the parameter , Ignore all hang ups (SIGHUP) The signal . Use... After logging out nohup Command to run programs in the background . To run... In the background nohup command , add to & ( Express “and” The symbol of ) To the end of the order .
Parameter description :
Command: Commands to execute .
Arg: Some of the parameters , You can specify the output file .
&: Let the command be executed in the background , After the terminal exits, the command is still executed .
tail
tail The command can be used to view the contents of a file , There is a common parameter -f Often used to look up log files that are changing
Parameters :
-f Cyclic reading
-q Do not display processing information
-v Display detailed processing information
-c< number > Number of bytes displayed
-n< Row number > Show the end of the file n Row content
–pid=PID And -f share , In progress ID,PID After death
-q, --quiet, --silent Never output the first part of a given filename
-s, --sleep-interval=S And -f share , It means to sleep at the interval of each repetition S second
First of all to enter python File directory and compile

To background process
nohup python -u translate .py > run.log 2>&1 &
Start a SSH Connect to see the training output
Statement : Excessive model generation takes time , If not, please wait patiently
tail -f run.log
Execution results

a
边栏推荐
- C iterator
- ImportError: cannot import name ‘joblib‘ from ‘sklearn.externals‘
- 你真的懂熵了吗(含交叉熵)
- Shopify theme style development
- 全国产工业级全志T3/A40i核心板-CoM-X40I,助力智能电力系统
- "= =" what is the difference between the equals method and the equals method?
- [reprint] LCD common interface principle
- 全志V3s学习记录(11)音频、视频使用总结
- 全志V3s学习记录(9)buildroot文件系统构建
- C multithreading
猜你喜欢

Bladed software windfile calculation

小米4安装微信失败

全志V3s学习记录(11)音频、视频使用总结

全国产工业级全志T3/A40i核心板-CoM-X40I,助力智能电力系统

Bladed v4.3 installation (Pojie) process

你真的懂熵了嗎(含交叉熵)

error: subprocess-exited-with-error(fasttext)

工业级AM335X核心模块选型

Encounter nodejs

The national industrial level Quanzhi t3/a40i core board -com-x40i helps the intelligent power system
随机推荐
Coredns part 2- compiling and installing external plugins
Introduction to bladed fault simulation method
全志平台BSP裁剪(2)附件一 General setup配置说明
Dropout regularization
Conversion of data type real and word in PROFIBUS DP communication
Testing and threading
Competition between am335x and Quanzhi a40i
Do you really understand entropy (including cross entropy)
全志T7/T507 Qt5.12.5移植记录
照葫芦画瓢,移植qt5.12到T507开发板
Bladed sequential wind configuration method
Codeblocks项目窗口管理
C generic constraint
全志平台BSP裁剪(1)kernel裁剪--调试工具和调试信息的裁剪
Analysis and Discussion on security level of 6-bit password lock
【2022初春】【LeetCode】45. 跳跃游戏 II
Exponential moving weighted average
Itop-2k1000 development board startup ramdisk production startup USB flash disk
内核中的互斥与同步
unity平台相关宏