当前位置：网站首页>RNN and its improved version (with 2 code cases attached)

RNN and its improved version (with 2 code cases attached)

2022-06-09 06:21:00 【GodGump】

Thank you for reading

RNN brief introduction
RNN Example （ The classification of names ）
Attention mechanism
RNN Case study seq2seq English translation
About server debugging python Explanation

RNN brief introduction

RNN(Recurrent Neural Network), It is called cyclic neural network in Chinese , It generally takes sequence data as input , Through the internal structure design of the network, the relationship characteristics between sequences are effectively captured , Generally, it is also output in the form of sequence

Tradition RNN

Internal structure process demonstration

Insert picture description here
The two black dots arrive at the blue area together （ And form a whole before ）

Internal calculation formula

Insert picture description here

RNN Output

Insert picture description here

Activation function tanh

Insert picture description here
To help adjust the values flowing through the network , tanh The function compresses the value into -1 and 1 Between

Pytorch Build tradition RNN

def dm_run_for_hiddennum():
    '''  The first parameter ：input_size( Input tensor x Dimensions )  The second parameter ：hidden_size( Dimensions of hidden layers ,  The number of neurons in the hidden layer )  The third parameter ：num_layer( Number of hidden layers ) '''
    rnn = nn.RNN(5, 6, 2)  # A  The number of hidden layers is from 1-->2  The following procedures need to be modified ？
    '''  The first parameter ：sequence_length( Enter the length of the sequence )  The second parameter ：batch_size( Number of samples in the batch )  The third parameter ：input_size( Input the dimension of tensor ) '''
    input = torch.randn(1, 3, 5)  # B
    '''  The first parameter ：num_layer * num_directions( The layer number * Network direction )  The second parameter ：batch_size( Number of samples in the batch )  The third parameter ：hidden_size( Dimensions of hidden layers ,  Number of neurons in the hidden layer ) '''
    h0 = torch.randn(2, 3, 6)  # C

    output, hn = rnn(input, h0)	  #
    print('output-->', output.shape, output)
    print('hn-->', hn.shape, hn)
    print('rnn Model --->', rnn)  # nn Model ---> RNN(5, 6, num_layers=11)

    #  Conclusion ： If there is only one hidden time  output The output is equal to hn
    #  Conclusion ： If there is 2 Hidden layers ,output The output of is 2 individual ,hn Equal to the last hidden layer

Gradient calculation

Insert picture description here

LSTM Introduce

advantage ：LSTM The gate structure can effectively slow down the gradient disappearance or explosion that may occur in long sequence problems , Although this phenomenon cannot be eliminated , But it performs better than traditional methods on longer sequence problems RNN.
shortcoming ： Because the internal structure is relatively complex , Therefore, the training efficiency is more traditional under the same computing power RNN A lot lower

Structural analysis of forgetting door ：

With the traditional RNN The internal structure of the calculation is very similar , First enter the current time step x(t) The implicit state of the previous time step h(t-1) Splicing , obtain [x(t), h(t-1)], Then make the transformation through a full connection layer , Finally through sigmoid Function is activated to get f(t), We can f(t) As gate value , Like the size and degree of opening and closing of a door , The door values will act on the tensor passing through the door , The forgetting gate value will act on the cell state of the upper layer , Represents how much information you have forgotten about the past , And because the forgetting gate value is determined by x(t), h(t-1) Calculated , Therefore, the whole formula means that according to the current time step input and the implicit state of the previous time step h(t-1) To determine how much past information carried by the cell state of the upper layer is forgotten .

Input gate structure analysis :

We see that the calculation formula of the input gate has two , The first is to generate the formula of the input gate value , It's almost the same as the forgetting gate formula , The difference is only in what they will do later . This formula means how much input information needs to be filtered . The second formula for the input gate is the same as the traditional RNN The internal structure calculation is the same . about LSTM Speaking of , What it gets is the current cell state , Not like a classic RNN The same is the implicit state .

Cell state update analysis :

The structure and calculation formula of cell renewal are very easy to understand , There is no full connection layer , Just compare the forgetting gate value just obtained with that obtained in the previous time step C(t-1) Multiply , In addition, the input gate value and the current time step are not updated C(t) The result of multiplication . Finally get the updated C(t) As part of the next time step input . The whole process of cell state update is the application of forgetting gate and input gate .

Output gate structure analysis :

The formula of the output gate part is also two , The first is to calculate the gate value of the output gate , It and forgetting the door , The input gate is calculated in the same way . The second is to use this gate value to generate an implicit state h(t), He will act on the renewed cell state C(t) On , And do tanh Activate , The resulting h(t) As part of the next time step input . The whole process of output gate , To produce an implicit state h(t).

chart

Insert picture description here

C It means a memory cell at a certain moment
h Indicates the state at a certain time
f Oblivion gate
i Update door
o Output gate
ht Flow to the next time and direct output

The gradient formula

Insert picture description here

The real life Liezi strengthens the understanding

Let's take the exam as an example ： We will have the final exam soon , The first gate is advanced mathematics , The second gate line generation . Of this picture Xt It is kaoxian generation ,h(t-1) It is the state of taking the advanced mathematics test ,c(t-1) It's the memory of high numbers ,ht It includes the status of the end of the test line generation and the score , The fraction is output from above as output , Pass status to next exam （ Such as English ）. Why forget ？ Because not everything needs to be cared about by the thread generation , This is the function of the forgetting gate . Why keep the previous knowledge , For example, the computing power used in advanced numbers is something we need to keep .
alike , Tradition RNN Just remember everything , Efficiency will become lower .

Code example

#  Definition LSTM Parameter meaning of : (input_size, hidden_size, num_layers)
#  Define the parameter meaning of the input tensor : (sequence_length, batch_size, input_size)
#  The parameter meanings of hidden layer initial tensor and cell initial state tensor are defined :
# (num_layers * num_directions, batch_size, hidden_size)

>>> import torch.nn as nn
>>> import torch
>>> rnn = nn.LSTM(5, 6, 2)
>>> input = torch.randn(1, 3, 5)
>>> h0 = torch.randn(2, 3, 6)
>>> c0 = torch.randn(2, 3, 6)
>>> output, (hn, cn) = rnn(input, (h0, c0))
>>> output
tensor([[[ 0.0447, -0.0335,  0.1454,  0.0438,  0.0865,  0.0416],
         [ 0.0105,  0.1923,  0.5507, -0.1742,  0.1569, -0.0548],
         [-0.1186,  0.1835, -0.0022, -0.1388, -0.0877, -0.4007]]],
       grad_fn=<StackBackward>)
>>> hn
tensor([[[ 0.4647, -0.2364,  0.0645, -0.3996, -0.0500, -0.0152],
         [ 0.3852,  0.0704,  0.2103, -0.2524,  0.0243,  0.0477],
         [ 0.2571,  0.0608,  0.2322,  0.1815, -0.0513, -0.0291]],

        [[ 0.0447, -0.0335,  0.1454,  0.0438,  0.0865,  0.0416],
         [ 0.0105,  0.1923,  0.5507, -0.1742,  0.1569, -0.0548],
         [-0.1186,  0.1835, -0.0022, -0.1388, -0.0877, -0.4007]]],
       grad_fn=<StackBackward>)
>>> cn
tensor([[[ 0.8083, -0.5500,  0.1009, -0.5806, -0.0668, -0.1161],
         [ 0.7438,  0.0957,  0.5509, -0.7725,  0.0824,  0.0626],
         [ 0.3131,  0.0920,  0.8359,  0.9187, -0.4826, -0.0717]],

        [[ 0.1240, -0.0526,  0.3035,  0.1099,  0.5915,  0.0828],
         [ 0.0203,  0.8367,  0.9832, -0.4454,  0.3917, -0.1983],
         [-0.2976,  0.7764, -0.0074, -0.1965, -0.1343, -0.6683]]],
       grad_fn=<StackBackward>)

GRU Introduce

GRU（Gated Recurrent Unit） Also known as gated cycle unit structure , It's also traditional RNN A variation of the , Same as LSTM It can also effectively capture the semantic association between long sequences , Alleviate gradient disappearance or explosion .

chart

Insert picture description here

Individual to GRU The understanding of the

I am not talented , If there is any misunderstanding , I hope the sea culvert is right , Thank you first .
Actually , Personal feeling GRU yes LSTM Improved version , Put memory 、h The two functions have been merged , Its core is the last formula in the figure above , This determines the intensity of memory and forgetting .
I still take the above learning as an example .h（t-1） It's equivalent to everything we learned in school , Now we are going to finish the design （ Take software engineering as an example , Because I am not a software engineering major , I don't know much about other majors ）.Xt That is, we should finish the design .h(t) That is, what we have learned can be provided to those who have completed the design , such as python Language 、 Software architecture 、 Introduction to software engineering, etc .rt What is it? ？ It is a comparison of the things that various disciplines can provide to complete the design , such as python Language provides 70%, The software architecture provides 10%. Of course, there may be 0, For example, foreign history in elective courses .h( With wavy lines on it ) Is equivalent to something that needs to be learned , For example, when we finished the design , Want to build a reverse proxy server , The school didn't teach us , Should we learn by ourselves ？

LSTM Two incomparable places

1. Due to the reduction of parameters , Model training speed will be improved , At the same time, the interpretability is also slightly improved .
2. Due to the improvement of operation , Reduce the risk of over fitting .

RNN Example （ The classification of names ）

Case introduction

Enter with a person's name , Use the model to help us determine which country it is most likely to come from , This is of great significance in the business of some international companies , During user registration , The user will be directly assigned with possible country or region options according to the name filled in by the user , And the national flag of the country or region , Limit the number of mobile phone numbers, etc .

Data set download and interpretation

stay github Upper name_decalre
Click me to download
Data format description The first word in each line is the person's name , The second word is the country name . Use tabs in the middle tab Division

Guide pack

#  Import torch Tools 
import torch
#  Import nn Prepare to build the model 
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
#  Import torch Data source   Data iterator toolkit 
from  torch.utils.data import Dataset, DataLoader
#  Used to get common letters and character normalization 
import string
#  Import time Kit 
import time
#  Introduce the drawing toolkit  
import matplotlib.pyplot as plt
#  from io Import file opening method in 
from io import open

Check the number of common characters

def data_process():
    #  Get all common characters, including letters and common punctuation 
    all_letters = string.ascii_letters + " .,;'"
    #  Get the number of common characters 
    n_letters = len(all_letters)
    return n_letters


def main():
    print(data_process())
    return 0


if __name__ == '__main__':
    main()

Build country names , And get the number of countries

def get_country():
    #  The name of the country   Number of species 
    categorys = ['Italian', 'English', 'Arabic', 'Spanish', 'Scottish', 'Irish', 'Chinese', 'Vietnamese', 'Japanese',
                 'French', 'Greek', 'Dutch', 'Korean', 'Polish', 'Portuguese', 'Russian', 'Czech', 'German']
    #  The name of the country   Number 
    categorynum = len(categorys)
    return categorys,categorynum


def main():
    # print(data_process())
    print(get_country())
    return 0

Read data to memory

def read_data(filename):
    """ :param filename: :return: #  Thought analysis  1  Open data file  open(filename, mode='r', encoding='utf-8') 2  Read file by line 、 Sample extraction x  sample y line.strip().split('\t') 3  Return samples x A list of 、 sample y A list of  my_list_x, my_list_y """
    my_list_x, my_list_y= [], []
    #  Open file 
    with  open(filename, mode='r', encoding='utf-8') as f:
        #  Read data by line 
        for line in f.readlines():
            if len(line) <= 5:
                continue
            #  Extract samples by line x  sample y
            (x, y) = line.strip().split('\t')
            my_list_x.append(x)
            my_list_y.append(y)
    #  Return samples x A list of 、 sample y A list of 
    return my_list_x, my_list_y

Build the data source and iterate

def read_data(filename):
    """ :param filename: :return: #  Thought analysis  1  Open data file  open(filename, mode='r', encoding='utf-8') 2  Read file by line 、 Sample extraction x  sample y line.strip().split('\t') 3  Return samples x A list of 、 sample y A list of  my_list_x, my_list_y """
    my_list_x, my_list_y= [], []
    #  Open file 
    with  open(filename, mode='r', encoding='utf-8') as f:
        #  Read data by line 
        for line in f.readlines():
            if len(line) <= 5:
                continue
            #  Extract samples by line x  sample y
            (x, y) = line.strip().split('\t')
            my_list_x.append(x)
            my_list_y.append(y)
    #  Return samples x A list of 、 sample y A list of 
    return my_list_x, my_list_y
class NameClassDataset(Dataset):

    def __init__(self, my_list_x, my_list_y):
        #  sample x
        self.my_list_x = my_list_x
        #  sample y
        self.my_list_y = my_list_y
        #  Number of sample entries 
        self.sample_len = len(my_list_x)

    #  Get the number of samples 
    def __len__(self):
        return self.sample_len

    #  Get the number   Sample data 
    def __getitem__(self, index):

        #  Yes index Outliers are corrected  [0, self.sample_len-1]
        index = min(max(index, 0), self.sample_len-1)
        #  Get... By index   Data samples  x y
        x = self.my_list_x[index]
        y = self.my_list_y[index]
        # print(x, y)
        #  sample x one-hot Zhang quantization 
        tensor_x = torch.zeros(len(x), n_letters)
        #  Traverse the names of people   Of   Every letter   Make it one-hot code 
        for li, letter in enumerate(x):
            # letter2indx  Use all_letters.find(letter) Find letters in all_letters The position in the table 
            #  to one-hot assignment 
            tensor_x[li][all_letters.find(letter)] = 1
        #  sample y  Zhang quantization 
        tensor_y = torch.tensor(categorys.index(y), dtype=torch.long)
        #  Return results 
        return tensor_x, tensor_y
def dm_test_NameClassDataset():

    # 1  get data 
    myfilename = '../data/name_classfication.txt'
    my_list_x, my_list_y = read_data(myfilename)
    # 2  Instantiation dataset object 
    nameclassdataset = NameClassDataset(my_list_x, my_list_y)
    # 3  Instantiation dataloader
    mydataloader = DataLoader(dataset=nameclassdataset, batch_size=1, shuffle=True)
    for  i, (x, y) in enumerate (mydataloader):
        print('x.shape', x.shape, x)
        print('y.shape', y.shape, y)
        break

Improved handling of exception indexes

python The language supports our index , So our index should be the same , But the program can be a little more complicated , The code that you want to replace can put NameClassDataset Of __getitem__ Replace with the following code ：

    def __getitem__(self, index):

        #  Yes index Outliers are corrected  [0, self.sample_len-1]
        if index < 0:
            index = max(-self.sample_len, index)
        else:
            index = min(self.sample_len-1, index)
        #  Get... By index   Data samples  x y
        x = self.my_list_x[index]
        y = self.my_list_y[index]
        # print(x, y)
        #  sample x one-hot Zhang quantization 
        tensor_x = torch.zeros(len(x), n_letters)
        #  Traverse the names of people   Of   Every letter   Make it one-hot code 
        for li, letter in enumerate(x):
            # letter2indx  Use all_letters.find(letter) Find letters in all_letters The position in the table 
            #  to one-hot assignment 
            tensor_x[li][all_letters.find(letter)] = 1
        #  sample y  Zhang quantization 
        tensor_y = torch.tensor(categorys.index(y), dtype=torch.long)
        #  Return results 
        return tensor_x, tensor_y

Build three RNN Model

Build tradition RNN

class RNN(nn.Module):

    def __init__(self, input_size, hidden_size, output_size, num_layers=1):
        super(RNN, self).__init__()
        # 1 init function   Prepare three layers  self.rnn self.linear self.softmax=nn.LogSoftmax(dim=-1)
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.num_layers = num_layers

        #  Definition rnn layer 
        self.rnn = nn.RNN(self.input_size, self.hidden_size, self.num_layers)
        #  Definition linear layer （ Fully connected linear layer ）
        self.linear = nn.Linear(self.hidden_size, self.output_size)
        #  Definition softmax layer 
        self.softmax = nn.LogSoftmax(dim=-1)

    def forward(self, input, hidden):
        #  Let the data go through three layers   return softmax Results and hn
        #  Data shape  [6,57] -> [6,1,57]
        input = input.unsqueeze(1)
        #  Send the data to the model   Extract the features of things 
        #  Data shape  [seqlen,1,57],[1,1,128]) -> [seqlen,1,18],[1,1,128]
        rr, hn = self.rnn(input, hidden)
        #  Data shape  [seqlen,1,128] - [1, 128]
        tmprr = rr[-1]
        tmprr = self.linear(tmprr)
        return self.softmax(tmprr), hn

    def inithidden(self):
        #  Initialize hidden layer input data  inithidden()
        return torch.zeros(self.num_layers, 1,self.hidden_size)

structure LSTM

class LSTM(nn.Module):

    def __init__(self, input_size, hidden_size, output_size, num_layers=1):
        super(LSTM, self).__init__()
        # 1 init function   Prepare three layers  self.rnn self.linear self.softmax=nn.LogSoftmax(dim=-1)
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.num_layers = num_layers

        #  Definition rnn layer 
        self.rnn = nn.LSTM(self.input_size, self.hidden_size, self.num_layers)

        #  Definition linear layer 
        self.linear = nn.Linear(self.hidden_size, self.output_size)

        #  Definition softmax layer 
        self.softmax = nn.LogSoftmax(dim=-1)

    def forward(self, input, hidden, c):
        #  Let the data go through three layers   return softmax Results and  hn c
        #  Data shape  [6,57] -> [6,1,52]
        input = input.unsqueeze(1)
        #  Send the data to the model   Extract the features of things 
        #  Data shape  [seqlen,1,57],[1,1,128], [1,1,128]) -> [seqlen,1,18],[1,1,128],[1,1,128]
        rr, (hn, c) = self.rnn(input, (hidden, c))
        #  Data shape  [seqlen,1,128] - [1, 128]
        tmprr = rr[-1]

        tmprr = self.linear(tmprr)

        return self.softmax(tmprr), hn, c

    def inithidden(self):
        #  Initialize hidden layer input data  inithidden()
        hidden = c = torch.zeros(self.num_layers, 1, self.hidden_size)
        return hidden, c

structure GRU

class GRU(nn.Module):

    def __init__(self, input_size, hidden_size, output_size, num_layers=1):
        super(GRU, self).__init__()
        # 1 init function   Prepare three layers  self.rnn self.linear self.softmax=nn.LogSoftmax(dim=-1)
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.num_layers = num_layers

        #  Definition rnn layer 
        self.rnn = nn.GRU(self.input_size, self.hidden_size, self.num_layers)
        #  Definition linear layer 
        self.linear = nn.Linear(self.hidden_size, self.output_size)
        #  Definition softmax layer 
        self.softmax = nn.LogSoftmax(dim=-1)

    def forward(self, input, hidden):
        #  Let the data go through three layers   return softmax Results and hn
        #  Data shape  [6,57] -> [6,1,52]
        input = input.unsqueeze(1)
        #  Send the data to the model   Extract the features of things 
        #  Data shape  [seqlen,1,57],[1,1,128]) -> [seqlen,1,18],[1,1,128]
        rr, hn = self.rnn(input, hidden)
        #  Data shape  [seqlen,1,128] - [1, 128]
        tmprr = rr[-1]
        tmprr = self.linear(tmprr)
        #  Many classification softmax  Two classification sigmoid
        return self.softmax(tmprr), hn

    def inithidden(self):
        #  Initialize hidden layer input data  inithidden()
        return torch.zeros(self.num_layers, 1, self.hidden_size)

Test and train the three models

test

def dm_test_rnn_lstm_gru():
    # one-hot Coding features 57（n_letters）, It's also RNN Input dimensions for 
    input_size = 57

    #  Define the last dimension of the hidden layer 
    n_hidden = 128
    #  The output size is the total number of language categories n_categories # 1 Characters are predicted to be 18 Categories 
    output_size = 18
    # 1  get data 
    myfilename = '../data/name_classfication.txt'
    my_list_x, my_list_y = read_data(myfilename)
    # 2  Instantiation dataset object 
    nameclassdataset = NameClassDataset(my_list_x, my_list_y)
    # 3  Instantiation dataloader
    mydataloader = DataLoader(dataset=nameclassdataset, batch_size=1, shuffle=True)
    my_rnn = RNN(n_letters, n_hidden, categorynum)
    my_lstm = LSTM(n_letters, n_hidden, categorynum)
    my_gru = GRU(n_letters, n_hidden, categorynum)


    for i, (x, y) in enumerate(mydataloader):
        # print('x.shape', x.shape, x)
        # print('y.shape', y.shape, y)
        #  Initialize a 3D hidden layer 0 tensor ,  It's also the initial cell state tensor 
        output, hidden = my_rnn(x[0], my_rnn.inithidden())
        print("rnn output.shape--->:", output.shape, output)
        if (i == 0):
            break

    for i, (x, y) in enumerate(mydataloader):
        # print('x.shape', x.shape, x)
        # print('y.shape', y.shape, y)
        hidden, c = my_lstm.inithidden()
        output, hidden, c = my_lstm(x[0], hidden, c)
        print("lstm output.shape--->:", output.shape, output)
        if (i == 0):
            break

    for i, (x, y) in enumerate(mydataloader):
        # print('x.shape', x.shape, x)
        # print('y.shape', y.shape, y)
        output, hidden = my_gru(x[0], my_gru.inithidden())
        print("gru output.shape--->:", output.shape, output)
        if (i == 0):
            break

Training

Tradition RNN

#  Model training parameters 
mylr = 1e-3
epochs = 1

def my_train_rnn():

    #  get data 
    myfilename = './data/name_classfication.txt'
    my_list_x, my_list_y = read_data(myfilename)

    #  Instantiation dataset object 
    nameclassdataset = NameClassDataset(my_list_x, my_list_y)

    #  Instantiation   Model 
    input_size = 57
    n_hidden = 128
    output_size = 18
    my_rnn = RNN(input_size, n_hidden, output_size)
    print('my_rnn Model --->', my_rnn)

    #  Instantiation   Loss function  adam Optimizer 
    mycrossentropyloss = nn.NLLLoss()
    myadam = optim.Adam(my_rnn.parameters(), lr=mylr)

    #  Define model training parameters 
    starttime = time.time()
    total_iter_num = 0  #  Number of trained samples 
    total_loss = 0.0  #  Trained losses and 
    total_loss_list = []  #  Every time 100 Find the average loss for each sample   Form a loss list 
    total_acc_num = 0  #  The trained sample predicts the exact total number 
    total_acc_list = []  #  Every time 100 One time average accuracy of samples   Form a list of average accuracy 

    #  Outer layer for loop   Control the number of wheels 
    for epoch_idx in range(epochs):

        #  Instantiation dataloader
        mydataloader = DataLoader(dataset=nameclassdataset, batch_size=1, shuffle=True)

        #  Inner layer for loop   Control the number of iterations 
        for i, (x, y) in enumerate(mydataloader):
            #  Feed data to the model 
            output, hidden = my_rnn(x[0], my_rnn.inithidden())

            #  Calculate the loss 
            myloss = mycrossentropyloss(output, y)

            #  Gradient clear 
            myadam.zero_grad()

            #  Back propagation 
            myloss.backward()

            #  Gradient update 
            myadam.step()

            #  Calculate the total loss 
            total_iter_num = total_iter_num + 1
            total_loss = total_loss + myloss.item()

            #  Calculate the total accuracy 
            i_predit_tag = (1 if torch.argmax(output).item() == y.item() else 0)
            total_acc_num = total_acc_num + i_predit_tag

            #  Every time 100 Time training   Find an average loss   Average accuracy 
            if (total_iter_num % 100 == 0):
                tmploss = total_loss/total_iter_num
                total_loss_list.append(tmploss)

                tmpacc = total_acc_num/total_iter_num
                total_acc_list.append(tmpacc)

            #  Every time 2000 Time training   Print log 
            if (total_iter_num % 2000 == 0):
                tmploss = total_loss / total_iter_num
                print(' Rounds :%d,  Loss :%.6f,  Time :%d, Accuracy rate :%.3f' %(epoch_idx+1, tmploss, time.time() - starttime, tmpacc))

        #  Save the model for each round 
        torch.save(my_rnn.state_dict(), './my_rnn_model_%d.bin' % (epoch_idx + 1))

    #  Calculate the total time 
    total_time = int(time.time() - starttime)

    return total_loss_list, total_time, total_acc_list

LSTM

def my_train_lstm():

    #  get data 
    myfilename = './data/name_classfication.txt'
    my_list_x, my_list_y = read_data(myfilename)

    #  Instantiation dataset object 
    nameclassdataset = NameClassDataset(my_list_x, my_list_y)

    #  Instantiation   Model 
    input_size = 57
    n_hidden = 128
    output_size = 18
    my_lstm = LSTM(input_size, n_hidden, output_size)
    print('my_lstm Model --->', my_lstm)

    #  Instantiation   Loss function  adam Optimizer 
    mycrossentropyloss = nn.NLLLoss()
    myadam = optim.Adam(my_lstm.parameters(), lr=mylr)

    #  Define model training parameters 
    starttime = time.time()
    total_iter_num = 0  #  Number of trained samples 
    total_loss = 0.0  #  Trained losses and 
    total_loss_list = []  #  Every time 100 Find the average loss for each sample   Form a loss list 
    total_acc_num = 0  #  The trained sample predicts the exact total number 
    total_acc_list = []  #  Every time 100 One time average accuracy of samples   Form a list of average accuracy 

    #  Outer layer for loop   Control the number of wheels 
    for epoch_idx in range(epochs):

        #  Instantiation dataloader
        mydataloader = DataLoader(dataset=nameclassdataset, batch_size=1, shuffle=True)

        #  Inner layer for loop   Control the number of iterations 
        for i, (x, y) in enumerate(mydataloader):
            #  Feed data to the model 
            hidden, c = my_lstm.inithidden()
            output, hidden, c = my_lstm(x[0], hidden, c)

            #  Calculate the loss 
            myloss = mycrossentropyloss(output, y)

            #  Gradient clear 
            myadam.zero_grad()

            #  Back propagation 
            myloss.backward()

            #  Gradient update 
            myadam.step()

            #  Calculate the total loss 
            total_iter_num = total_iter_num + 1
            total_loss = total_loss + myloss.item()

            #  Calculate the total accuracy 
            i_predit_tag = (1 if torch.argmax(output).item() == y.item() else 0)
            total_acc_num = total_acc_num + i_predit_tag

            #  Every time 100 Time training   Find an average loss   Average accuracy 
            if (total_iter_num % 100 == 0):
                tmploss = total_loss/total_iter_num
                total_loss_list.append(tmploss)

                tmpacc = total_acc_num/total_iter_num
                total_acc_list.append(tmpacc)

            #  Every time 2000 Time training   Print log 
            if (total_iter_num % 2000 == 0):
                tmploss = total_loss / total_iter_num
                print(' Rounds :%d,  Loss :%.6f,  Time :%d, Accuracy rate :%.3f' %(epoch_idx+1, tmploss, time.time() - starttime, tmpacc))

        #  Save the model for each round 
        torch.save(my_lstm.state_dict(), './my_lstm_model_%d.bin' % (epoch_idx + 1))

    #  Calculate the total time 
    total_time = int(time.time() - starttime)

    return total_loss_list, total_time, total_acc_list

GRU

def my_train_gru():

    #  get data 
    myfilename = './data/name_classfication.txt'
    my_list_x, my_list_y = read_data(myfilename)

    #  Instantiation dataset object 
    nameclassdataset = NameClassDataset(my_list_x, my_list_y)

    #  Instantiation   Model 
    input_size = 57
    n_hidden = 128
    output_size = 18
    my_gru = GRU(input_size, n_hidden, output_size)
    print('my_gru Model --->', my_gru)

    #  Instantiation   Loss function  adam Optimizer 
    mycrossentropyloss = nn.NLLLoss()
    myadam = optim.Adam(my_gru.parameters(), lr=mylr)

    #  Define model training parameters 
    starttime = time.time()
    total_iter_num = 0  #  Number of trained samples 
    total_loss = 0.0  #  Trained losses and 
    total_loss_list = []  #  Every time 100 Find the average loss for each sample   Form a loss list 
    total_acc_num = 0  #  The trained sample predicts the exact total number 
    total_acc_list = []  #  Every time 100 One time average accuracy of samples   Form a list of average accuracy 

    #  Outer layer for loop   Control the number of wheels 
    for epoch_idx in range(epochs):

        #  Instantiation dataloader
        mydataloader = DataLoader(dataset=nameclassdataset, batch_size=1, shuffle=True)

        #  Inner layer for loop   Control the number of iterations 
        for i, (x, y) in enumerate(mydataloader):
            #  Feed data to the model 
            output, hidden = my_gru(x[0], my_gru.inithidden())

            #  Calculate the loss 
            myloss = mycrossentropyloss(output, y)

            #  Gradient clear 
            myadam.zero_grad()

            #  Back propagation 
            myloss.backward()

            #  Gradient update 
            myadam.step()

            #  Calculate the total loss 
            total_iter_num = total_iter_num + 1
            total_loss = total_loss + myloss.item()

            #  Calculate the total accuracy 
            i_predit_tag = (1 if torch.argmax(output).item() == y.item() else 0)
            total_acc_num = total_acc_num + i_predit_tag

            #  Every time 100 Time training   Find an average loss   Average accuracy 
            if (total_iter_num % 100 == 0):
                tmploss = total_loss/total_iter_num
                total_loss_list.append(tmploss)

                tmpacc = total_acc_num/total_iter_num
                total_acc_list.append(tmpacc)

            #  Every time 2000 Time training   Print log 
            if (total_iter_num % 2000 == 0):
                tmploss = total_loss / total_iter_num
                print(' Rounds :%d,  Loss :%.6f,  Time :%d, Accuracy rate :%.3f' %(epoch_idx+1, tmploss, time.time() - starttime, tmpacc))

        #  Save the model for each round 
        torch.save(my_gru.state_dict(), './my_gru_model_%d.bin' % (epoch_idx + 1))

    #  Calculate the total time 
    total_time = int(time.time() - starttime)

    return total_loss_list, total_time, total_acc_list

To make predictions

Build a prediction function

# 1  Build tradition RNN Prediction function 
my_path_rnn = './model/my_rnn_model_1.bin'
my_path_lstm = './model/my_lstm_model_1.bin'
my_path_gru = './model/my_gru_model_1.bin'

#  Translate a person's name into onehot tensor 
# eg 'bai' --> [3,57]
def lineToTensor(x):
    #  Text sheet quantization x
    tensor_x = torch.zeros(len(x), n_letters)
    #  Traverse each character index and character in this person's name 
    for li, letter in enumerate(x):
        # letter In string all_letters Position in   Namely onehot tensor 1 The location of the index 
        # letter In string all_letters Position in   Use string find() Method to get 
        tensor_x[li][all_letters.find(letter)] = 1
    return tensor_x

Tradition RNN

#  Thought analysis 
# 1  Enter text data   Zhang quantization one-hot
# 2  Instantiation model   Load trained model parameters  m.load_state_dict(torch.load(my_path_rnn))
# 3  Model to predict  with torch.no_grad()
# 4  Before extracting from the forecast results 3 name , Display print results  output.topk(3, 1, True)
# category_idx = topi[0][i] category = categorys[category_idx]

#  structure rnn Prediction function 
def my_predict_rnn(x):

    n_letters = 57
    n_hidden = 128
    n_categories = 18


    #  Input text ,  Zhang quantization one-hot
    x_tensor = lineToTensor(x)

    #  Instantiation model   Load trained model parameters 
    my_rnn = RNN(n_letters, n_hidden, n_categories)
    my_rnn.load_state_dict(torch.load(my_path_rnn))

    with torch.no_grad():
        #  Model to predict 
        output, hidden = my_rnn(x_tensor, my_rnn.inithidden())

        #  Before extracting from the forecast results 3 name 
        # 3 Before taking 3 name , 1 Represents the dimension to sort , True Indicates whether to return the largest or lowest element 
        topv, topi = output.topk(3, 1, True)

        print('rnn =>', x)
        for i in range(3):
            value = topv[0][i]
            category_idx = topi[0][i]
            category = categorys[category_idx]
            print('\t value:%d category:%s' %(value, category))

LSTM

#  structure LSTM  Prediction function 
def my_predict_lstm(x):

    n_letters = 57
    n_hidden = 128
    n_categories = 18

    #  Input text ,  Zhang quantization one-hot
    x_tensor = lineToTensor(x)

    #  Instantiation model   Load trained model parameters 
    my_lstm = LSTM(n_letters, n_hidden, n_categories)
    my_lstm.load_state_dict(torch.load(my_path_lstm))

    with torch.no_grad():
        #  Model to predict 
        hidden, c = my_lstm.inithidden()
        output, hidden, c = my_lstm(x_tensor, hidden, c)

        #  Before extracting from the forecast results 3 name 
        # 3 Before taking 3 name , 1 Represents the dimension to sort , True Indicates whether to return the largest or lowest element 
        topv, topi = output.topk(3, 1, True)

        print('rnn =>', x)
        for i in range(3):
            value = topv[0][i]
            category_idx = topi[0][i]
            category = categorys[category_idx]
            print('\t value:%d category:%s' % (value, category))
            print('\t value:%d category:%s' % (value, category))

GRU

#  structure GRU  Prediction function 
def my_predict_gru(x):

    n_letters = 57
    n_hidden = 128
    n_categories = 18

    #  Input text ,  Zhang quantization one-hot
    x_tensor = lineToTensor(x)

    #  Instantiation model   Load trained model parameters 
    my_gru = GRU(n_letters, n_hidden, n_categories)
    my_gru.load_state_dict(torch.load(my_path_gru))

    with torch.no_grad():
        #  Model to predict 
        output, hidden = my_gru(x_tensor, my_gru.inithidden())

        #  Before extracting from the forecast results 3 name 
        # 3 Before taking 3 name , 1 Represents the dimension to sort , True Indicates whether to return the largest or lowest element 
        topv, topi = output.topk(3, 1, True)

        print('rnn =>', x)
        for i in range(3):
            value = topv[0][i]
            category_idx = topi[0][i]
            category = categorys[category_idx]
            print('\t value:%d category:%s' % (value, category))

call

def dm_test_predic_rnn_lstm_gru():
        #  Put the entry addresses of the three functions   Make up a list , Unified input data for testing 
    for func in [my_predict_rnn, my_predict_lstm, my_predict_gru]:
        func('zhang')

Attention mechanism

Introduction to attention mechanism

The concept of attention

When we look at things , The reason why we can judge a thing quickly ( Of course, allowing judgment is wrong ), Because our brain can quickly focus on the most recognizable part of things and make judgments , Instead of looking at things from beginning to end , To have a judgment . Based on this theory , There is an attention mechanism .

Attention calculation rules

It requires three specified inputs Q(query), K(key), V(value), Then we get the result of attention through the calculation formula , This result represents query stay key and value Attention under action indicates . When you type Q=K=V when , It's called the self attention calculation rule .

effect

The attention mechanism at the decoder : The output result of the focusing encoder can be effectively according to the model target , When it is used as the input of the decoder, the effect is improved . Improving the previous encoder output is a single certain length tensor , Unable to store too much information .
The attention mechanism at the encoder side : It mainly solves the problem of representation , Equivalent to feature extraction process , Getting input indicates . Generally use self attention (self-attention).

Life scenes help understand

When we do English reading comprehension ,4 The only options are 1 One is right , The rest are interference items . The attention mechanism is to focus on the acquisition of the right options .

bmm Introduction to operations

#  If parameters 1 The shape is (b × n × m),  Parameters 2 The shape is (b × m × p),  The output of (b × n × p)
>>> input = torch.randn(10, 3, 4)
>>> mat2 = torch.randn(10, 4, 5)
>>> res = torch.bmm(input, mat2)
>>> res.size()
torch.Size([10, 3, 5])

Code implementation

import torch
import torch.nn as nn
import torch.nn.functional as F

# MyAtt Class implementation thinking analysis 
# 1 init function  (self, query_size, key_size, value_size1, value_size2, output_size)
#  Get ready 2 A linear layer   Attention weight distribution self.attn  The attention result indicates that the output layer is carried out according to the specified dimension  self.attn_combine
# 2 forward(self, Q, K, V):
#  Find the query tensor q Attention weight distribution , attn_weights[1,32]
#  Find the query tensor q The results of attention show that  bmm operation , attn_applied[1,1,64]
# q  And  attn_applied  The fusion , Then output according to the specified dimension  output[1,1,32]
#  The return attention result indicates output:[1,1,32],  Attention weight distribution attn_weights:[1,32]

class MyAtt(nn.Module):
    # 32 32 32 64 32
    def __init__(self, query_size, key_size, value_size1, value_size2, output_size):
        super(MyAtt, self).__init__()
        self.query_size = query_size
        self.key_size = key_size
        self.value_size1 = value_size1
        self.value_size2 = value_size2
        self.output_size = output_size

        #  Linear layer 1  Attention weight distribution 
        self.attn = nn.Linear(self.query_size + self.key_size, self.value_size1)

        #  Linear layer 2  The attention result indicates that the layer is output according to the specified dimension  self.attn_combine
        self.attn_combine = nn.Linear(self.query_size+self.value_size2, output_size)

    def forward(self, Q, K, V):
        # 1  Find the query tensor q Attention weight distribution , attn_weights[1,32]
        # [1,1,32],[1,1,32]--> [1,32],[1,32]->[1,64]
        # [1,64] --> [1,32]
        # tmp1 = torch.cat( (Q[0], K[0]), dim=1)
        # tmp2 = self.attn(tmp1)
        # tmp3 = F.softmax(tmp2, dim=1)
        attn_weights = F.softmax( self.attn(torch.cat( (Q[0], K[0]), dim=1)), dim=1)

        # 2  Find the query tensor q The results show that  bmm operation , attn_applied[1,1,64]
        # [1,1,32] * [1,32,64] ---> [1,1,64]
        attn_applied =  torch.bmm(attn_weights.unsqueeze(0), V)

        # 3 q  And  attn_applied  The fusion , Then output according to the specified dimension  output[1,1,64]
        # 3-1 q And the result indicates splicing  [1,32],[1,64] ---> [1,96]
        output = torch.cat((Q[0], attn_applied[0]), dim=1)
        # 3-2 shape [1,96] ---> [1,32]
        output = self.attn_combine(output).unsqueeze(0)

        # 4  The return attention result indicates output:[1,1,32],  Attention weight distribution attn_weights:[1,32]
        return output, attn_weights

if __name__ == '__main__':
    #  Why introduce attention mechanisms ：
    # rnn A series of recurrent neural networks , As time steps grow , Will forget the characteristics of the previous words , Resulting in inadequate extraction of sentence features 
    # rnn Serial recurrent neural network is a time step-by-step method to extract sentence features , inefficiency 
    #  Can it be right? 32 Extract the features of things at the same time , And it is parallel , This is the attention mechanism ！

    #  Task description 
    # v Is the content such as 32 One word, one word 64 Features ,k yes 32 An index of words ,q Is the query tensor 
    #  Our mission ： Enter the query tensor q, The following information is calculated through the attention mechanism ：
    # 1、 Query tensor q Attention weight distribution ： Query tensor q And others 32 Word correlation （ Acquaintance degree ）
    # 2、 Query tensor q The results show that ： There is an ordinary q Upgrade to a more powerful q; use q and v do bmm operation 
    # 3  Be careful ： Query tensor q Who is the target of the query , Is whose query tensor .
    # eg： For example, query tensor q The future is to look up words " I ", be q Namely " I " The query tensor of 

    query_size = 32
    key_size = 32
    value_size1 = 32 # 32 Word 
    value_size2 = 64 # 64 Features 
    output_size = 32

    Q = torch.randn(1, 1, 32)
    K = torch.randn(1, 1, 32)
    V = torch.randn(1, 32, 64)
    # V = torch.randn(1, value_size1, value_size2)

    # 1  Instantiate the attention class   object 
    myattobj = MyAtt(query_size, key_size, value_size1, value_size2, output_size)

    # 2  hold QKV Data is thrown to the attention mechanism , Find the query tensor q The results of attention show that 、 Attention weight distribution 
    (output, attn_weights) = myattobj(Q, K, V)
    print(' Query tensor q The results of attention show that output--->', output.shape, output)
    print(' Query tensor q Attention weight distribution attn_weights--->', attn_weights.shape, attn_weights)

RNN Case study seq2seq English translation

seq2seq Introduce

seq2seq Model architecture

Insert picture description here

Model explanation

seq2seq The model architecture consists of three parts , Namely encoder( Encoder )、decoder( decoder )、 Intermediate semantic tensor c. The internal implementation of encoder and decoder uses GRU Model
The picture shows a Chinese to English translation ： welcome Come on Beijing → welcome to BeiJing. The encoder first processes Chinese input " welcome Come on Beijing ", adopt GRU The model obtains the output tensor of each time step , Finally, they are spliced into an intermediate semantic tensor c; The decoder will then use this intermediate semantic tensor c And the hidden layer tensor of each time step , Generate the corresponding translation languages one by one

Dataset Download

eng_transfor_fra download

Guide the package and clean the files

#  For regular expressions 
import re
#  For building network structures and functions torch tool kit 
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
# torch Predefined optimization method toolkit in 
import torch.optim as optim
import time
#  Used to randomly generate data 
import random
import matplotlib.pyplot as plt

#  Device selection ,  We can choose in cuda perhaps cpu Run your code on 
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
#  Start flag 
SOS_token = 0
#  End mark 
EOS_token = 1
#  The maximum sentence length cannot exceed 10 individual  ( Include punctuation )
MAX_LENGTH = 10
#  Data file path 
data_path = '../data/eng-fra-v2.txt'

#  Text cleaning tool function 
def normalizeString(s):
    """ String normalization function ,  Parameters s Represents the incoming string """
    s = s.lower().strip()
    #  stay .!? Add a space before   there \1 For the first group   In regularization \num
    s = re.sub(r"([.!?])", r" \1", s)
    # s = re.sub(r"([.!?])", r" ", s)
    #  Use regular expressions to string   No   Replace upper and lower case letters and normal punctuation with spaces 
    s = re.sub(r"[^a-zA-Z.!?]+", r" ", s)
    return s

Thought analysis

my_getdata()  Analysis of the idea of cleaning text and building Dictionary 
1  Read file by line  open().read().strip().split(\n) my_lines
2  Clean text by line   Build language pairs  my_pairs
2-1 Format  [[' english ', ' French '], [' english ', ' French '], [' english ', ' French '], [' english ', ' French ']....]
2-2 Call the clean text tool function normalizeString(s)
3  Traverse language pairs   Build a dictionary of English words   A dictionary of French words 
3-1 english_word2index english_word_n french_word2index french_word_n
 among  english_word2index = {
    0: "SOS", 1: "EOS"}  english_word_n=2
3-2 english_index2word french_index2word
4  Return the 7 results 
english_word2index, english_index2word, english_word_n,
french_word2index, french_index2word, french_word_n, my_pairs

Data preprocessing

def my_getdata():

    # 1  Read file by line  open().read().strip().split(\n)
    my_lines = open(data_path, encoding='utf-8').read().strip().split('\n')
    # 2  Clean text by line   Build language pairs  my_pairs
    my_pairs = [[normalizeString(s) for s in l.split('\t')] for l in my_lines]

    # 3  Traverse language pairs   Build a dictionary of English words   A dictionary of French words 
    # 3-1 english_word2index english_word_n french_word2index french_word_n
    english_word2index = {
    0: "SOS", 1: "EOS"}
    english_word_n = 2

    french_word2index = {
    0: "SOS", 1: "EOS"}
    french_word_n = 2

    #  Traverse language pairs   Get the English word dictionary   A dictionary of French words 
    for pair in my_pairs:

        for word in pair[0].split(' '):
            if word not in english_word2index:
                english_word2index[word] = english_word_n
                # print(english_word2index)
                english_word_n += 1

        for word in pair[1].split(' '):
            if word not in french_word2index:
               french_word2index[word] = french_word_n
               french_word_n += 1

    # 3-2 english_index2word french_index2word
    english_index2word = {
    v:k for k, v in english_word2index.items()}
    french_index2word = {
    v:k for k, v in french_word2index.items()}

    return english_word2index, english_index2word, english_word_n,\
           french_word2index, french_index2word, french_word_n, my_pairs


def main():
    #  Global function   Get the English word dictionary   A dictionary of French words   List of language pairs my_pairs
    english_word2index, english_index2word, english_word_n, \
    french_word2index, french_index2word, french_word_n, \
    my_pairs = my_getdata()
    
    return 0

Build the data source object and test

class MyPairsDataset(Dataset):
    def __init__(self, my_pairs):
        #  sample x
        self.my_pairs = my_pairs
        #  Number of sample entries 
        self.sample_len = len(my_pairs)

    #  Get the number of samples 
    def __len__(self):
        return self.sample_len

    #  Get the number   Sample data 
    def __getitem__(self, index):

        #  Yes index Outliers are corrected  [0, self.sample_len-1]
        if index < 0:
            index = max(-self.sample_len, index)
        else:
            index = min(self.sample_len - 1, index)
        #  Get... By index   Data samples  x y
        x = self.my_pairs[index][0]
        y = self.my_pairs[index][1]
        #  sample x  Text digitization 
        x = [english_word2index[word] for word in x.split(' ')]
        # print(x)
        x.append(EOS_token)
        # print("x2: ", x)
        tensor_x = torch.tensor(x, dtype=torch.long, device=device)
        #  sample y  Text digitization 
        y = [french_word2index[word] for word in y.split(' ')]
        y.append(EOS_token)
        tensor_y = torch.tensor(y, dtype=torch.long, device=device)
        #  Be careful  tensor_x tensor_y They're all one-dimensional arrays , adopt DataLoader The data is two-dimensional data 
        # print('tensor_y.shape===>', tensor_y.shape, tensor_y)
        #  Return results 
        return tensor_x, tensor_y


def dm_test_MyPairsDataset(my_pairs):

    # 1  Instantiation dataset object 
    mypairsdataset = MyPairsDataset(my_pairs)
    # 2  Instantiation dataloader
    mydataloader = DataLoader(dataset=mypairsdataset, batch_size=1, shuffle=True)
    for i, (x, y) in enumerate(mydataloader):
        print('x.shape', x.shape, x)
        print('y.shape', y.shape, y)
        if i == 1:
            break


#  Global function   Get the English word dictionary   A dictionary of French words   List of language pairs my_pairs
english_word2index, english_index2word, english_word_n, \
french_word2index, french_index2word, french_word_n, \
my_pairs = my_getdata()


def main():
    dm_test_MyPairsDataset(my_pairs)
    return 0

Encoder and decoder

Build on GRU The encoder

Thought analysis

EncoderRNN class   Analysis of implementation ideas ：
1 init function   Definition 2 Layers  self.embedding self.gru (batch_first=True)
   def __init__(self, input_size, hidden_size): # 2803 256

2 forward(input, hidden) function , return output, hidden
   The data goes through the word embedding layer   Data shape  [1,6] --> [1,6,256]
   Data after gru layer   Change in shape  gru([1,6,256],[1,1,256]) --> [1,6,256] [1,1,256]

3  Initialize hidden layer input data  inithidden()
   shape  torch.zeros(1, 1, self.hidden_size, device=device)

Code implementation

class EncoderRNN(nn.Module):
    def __init__(self, input_size, hidden_size):

        # input_size  Encoder   Number of words in word embedding layer  eg：2803
        # hidden_size  Encoder   The number of features of each word in the word embedding layer  eg 256
        super(EncoderRNN, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size

        #  Instantiation nn.Embedding layer 
        self.embedding = nn.Embedding(input_size, hidden_size)

        #  Instantiation nn.GRU layer   Pay attention to the parameters batch_first=True
        self.gru = nn.GRU(hidden_size, hidden_size, batch_first=True)

    def forward(self, input, hidden):

        #  The data goes through the word embedding layer   Data shape  [1,6] --> [1,6,256]
        output = self.embedding(input)

        #  Data after gru layer   Data shape  gru([1,6,256],[1,1,256]) --> [1,6,256] [1,1,256]
        output, hidden = self.gru(output, hidden)
        return output, hidden

    def inithidden(self):
        #  Initialize the hidden layer tensor to 1x1xself.hidden_size The tensor of size 
        return torch.zeros(1, 1, self.hidden_size, device=device)

be based on GRU and Attention The decoder of

chart

Insert picture description here

Code implementation

class AttnDecoderRNN(nn.Module):
    def __init__(self, output_size, hidden_size, dropout_p=0.1, max_length=MAX_LENGTH):

        # output_size  Encoder   Number of words in word embedding layer  eg：4345
        # hidden_size  Encoder   The number of features of each word in the word embedding layer  eg 256
        # dropout_p  Zero setting ratio , Default 0.1,
        # max_length  Maximum length 10
        super(AttnDecoderRNN, self).__init__()
        self.output_size = output_size
        self.hidden_size = hidden_size
        self.dropout_p = dropout_p
        self.max_length = max_length

        #  Definition nn.Embedding layer  nn.Embedding(4345,256)
        self.embedding = nn.Embedding(self.output_size, self.hidden_size)

        #  Define the linear layer 1： seek q Attention weight distribution 
        self.attn = nn.Linear(self.hidden_size * 2, self.max_length)

        #  Define the linear layer 2：q+ Attention results indicate that after fusion , Output according to the specified dimension 
        self.attn_combine = nn.Linear(self.hidden_size * 2, self.hidden_size)

        #  Definition dropout layer 
        self.dropout = nn.Dropout(self.dropout_p)

        #  Definition gru layer 
        self.gru = nn.GRU(self.hidden_size, self.hidden_size, batch_first=True)

        #  Definition out layer   The decoder outputs by category (256,4345)
        self.out = nn.Linear(self.hidden_size, self.output_size)

        #  Instantiation softomax layer   Normalization of values   For classification 
        self.softmax = nn.LogSoftmax(dim=-1)

    def forward(self, input, hidden, encoder_outputs):
        # input representative q [1,1]  Two dimensional data  hidden representative k [1,1,256] encoder_outputs representative v [10,256]

        #  The data goes through the word embedding layer 
        #  Data shape  [1,1] --> [1,1,256]
        embedded = self.embedding(input)

        #  Use dropout Random discard , Prevent over fitting 
        embedded = self.dropout(embedded)

        # 1  Find the query tensor q Attention weight distribution , attn_weights[1,10]
        attn_weights = F.softmax(
            self.attn(torch.cat((embedded[0], hidden[0]), 1)), dim=1)

        # 2  Find the query tensor q The results of attention show that  bmm operation , attn_applied[1,1,256]
        # [1,1,10],[1,10,256] ---> [1,1,256]
        attn_applied = torch.bmm(attn_weights.unsqueeze(0), encoder_outputs.unsqueeze(0))

        # 3 q  And  attn_applied  The fusion , Then output according to the specified dimension  output[1,1,256]
        output = torch.cat((embedded[0], attn_applied[0]), 1)
        output = self.attn_combine(output).unsqueeze(0)

        #  Query tensor q The results of attention show that   Use relu Activate 
        output = F.relu(output)

        #  The query tensor passes through gru、softmax Output classification results 
        #  Data shape [1,1,256],[1,1,256] --> [1,1,256], [1,1,256]
        output, hidden = self.gru(output, hidden)
        #  Data shape [1,1,256]->[1,256]->[1,4345]
        output = self.softmax(self.out(output[0]))

        #  Returns the decoder classification output[1,4345], Finally, the hidden layer tensor hidden[1,1,256]  Attention weight tensor attn_weights[1,10]
        return output, hidden, attn_weights

    def inithidden(self):
        #  Initialize the hidden layer tensor to 1x1xself.hidden_size The tensor of size 
        return torch.zeros(1, 1, self.hidden_size, device=device)

Training models

teacher_forcing

teacher_forcing Is a training technique for sequence generation tasks , stay seq2seq Architecture , According to the theory of cyclic neural network , The decoder should use the result of the previous step as part of the input each time , But during training , Once the result of the previous step is wrong , This will cause such errors to accumulate , Unable to achieve training effect , therefore , We need a mechanism to change the error in the previous step , Thus born teacher_forcing

Internal iterative training function

Set parameters

#  Model training parameters 
mylr = 1e-4
epochs = 2
#  Set up teacher_forcing The ratio is 0.5
teacher_forcing_ratio = 0.5
print_interval_num = 1000
plot_interval_num = 100

Code implementation

def Train_Iters(x, y, my_encoderrnn, my_attndecoderrnn, myadam_encode, myadam_decode, mycrossentropyloss):

    # 1  code  encode_output, encode_hidden = my_encoderrnn(x, encode_hidden)
    encode_hidden = my_encoderrnn.inithidden()
    encode_output, encode_hidden = my_encoderrnn(x, encode_hidden) #  Send data at one time 
    # [1,6],[1,1,256] --> [1,6,256],[1,1,256]

    # 2  Decoding parameter preparation and decoding 
    #  Decoding parameters 1 encode_output_c [10,256]
    encode_output_c = torch.zeros(MAX_LENGTH, my_encoderrnn.hidden_size, device=device)
    for idx in range(x.shape[1]):
        encode_output_c[idx] = encode_output_c[0, idx]

    #  Decoding parameters 2
    decode_hidden = encode_hidden

    #  Decoding parameters 3
    input_y = torch.tensor([[SOS_token]], device=device)

    myloss = 0.0
    y_len = y.shape[1]

    use_teacher_forcing = True if random.random() < teacher_forcing_ratio else False
    if use_teacher_forcing:
        for idx in range(y_len):
            #  Data shape data shape  [1,1],[1,1,256],[10,256] ---> [1,4345],[1,1,256],[1,10]
            output_y, decode_hidden, attn_weight = my_attndecoderrnn(input_y, decode_hidden, encode_output_c)
            target_y = y[0][idx].view(1)
            myloss = myloss + mycrossentropyloss(output_y, target_y)
            input_y = y[0][idx].view(1, -1)
    else:
        for idx in range(y_len):
            #  Data shape data shape  [1,1],[1,1,256],[10,256] ---> [1,4345],[1,1,256],[1,10]
            output_y, decode_hidden, attn_weight = my_attndecoderrnn(input_y, decode_hidden, encode_output_c)
            target_y = y[0][idx].view(1)
            myloss = myloss + mycrossentropyloss(output_y, target_y)

            topv, topi = output_y.topk(1)
            if topi.squeeze().item() == EOS_token:
                break
            input_y = topi.detach()

    #  Gradient clear 
    myadam_encode.zero_grad()
    myadam_decode.zero_grad()

    #  Back propagation 
    myloss.backward()

    #  Gradient update 
    myadam_encode.step()
    myadam_decode.step()

    #  return   List of losses myloss.item()/y_len
    return myloss.item() / y_len

Training

def Train_seq2seq():

    #  Instantiation  mypairsdataset object   Instantiation  mydataloader
    mypairsdataset = MyPairsDataset(my_pairs)
    mydataloader = DataLoader(dataset=mypairsdataset, batch_size=1, shuffle=True)

    #  Instantiate the encoder  my_encoderrnn  Instantiate the decoder  my_attndecoderrnn
    my_encoderrnn = EncoderRNN(2803, 256)
    my_attndecoderrnn = AttnDecoderRNN(output_size=4345, hidden_size=256, dropout_p=0.1, max_length=10)

    #  Instantiate the encoder optimizer  myadam_encode  Instantiate the decoder optimizer  myadam_decode
    myadam_encode = optim.Adam(my_encoderrnn.parameters(), lr=mylr)
    myadam_decode = optim.Adam(my_attndecoderrnn.parameters(), lr=mylr)

    #  Instantiate the loss function  mycrossentropyloss = nn.NLLLoss()
    mycrossentropyloss = nn.NLLLoss()

    #  Define parameters for model training 
    plot_loss_list = []

    #  Outer layer for loop   Control the number of wheels  for epoch_idx in range(1, 1+epochs):
    for epoch_idx in range(1, 1+epochs):

        print_loss_total, plot_loss_total = 0.0, 0.0
        starttime = time.time()

        #  Inner layer for loop   Control the number of iterations 
        for item, (x, y) in enumerate(mydataloader, start=1):
            #  Call the internal training function 
            myloss = Train_Iters(x, y, my_encoderrnn, my_attndecoderrnn, myadam_encode, myadam_decode, mycrossentropyloss)
            print_loss_total += myloss
            plot_loss_total += myloss

            #  Calculate the print screen interval loss - every other 1000 Time 
            if item % print_interval_num ==0 :
                print_loss_avg = print_loss_total / print_interval_num
                #  Impute the total loss to 0
                print_loss_total = 0
                #  Print log , The log contents are ： Training time consuming , Current iteration step , Current progress percentage , Current average loss 
                print(' Rounds %d  Loss %.6f  Time :%d' % (epoch_idx, print_loss_avg, time.time() - starttime))

            #  Calculate the drawing interval loss - every other 100 Time 
            if item % plot_interval_num == 0:
                #  The average loss is obtained by dividing the total loss by the interval 
                plot_loss_avg = plot_loss_total / plot_interval_num
                #  Add the average loss to plot_loss_list In the list 
                plot_loss_list.append(plot_loss_avg)
                #  Total loss attribution 0
                plot_loss_total = 0

        #  Save the model for each round 
        torch.save(my_encoderrnn.state_dict(), './my_encoderrnn_%d.pth' % epoch_idx)
        torch.save(my_attndecoderrnn.state_dict(), './my_attndecoderrnn_%d.pth' % epoch_idx)

    #  All rounds of training are completed   Draw a loss chart 
    plt.figure()
    plt.plot(plot_loss_list)
    plt.savefig('./s2sq_loss.png')
    plt.show()

    return plot_loss_list

Model evaluation and testing

Evaluation function writing

def Seq2Seq_Evaluate(x, my_encoderrnn, my_attndecoderrnn):
    """  The model evaluation code is similar to the model prediction code , Attention to use with torch.no_grad()  Model prediction , The first time step uses SOS_token As input   The subsequent time steps use the predicted value as input , That is, the autoregressive mechanism  """"""
    with torch.no_grad():
        # 1  code ： Send data once 
        encode_hidden = my_encoderrnn.inithidden()
        encode_output, encode_hidden = my_encoderrnn(x, encode_hidden)

        # 2  Decoding parameter preparation 
        #  Decoding parameters 1  Fixed length intermediate semantic tensor c
        encoder_outputs_c = torch.zeros(MAX_LENGTH, my_encoderrnn.hidden_size, device=device)
        x_len = x.shape[1]
        for idx in range(x_len):
            encoder_outputs_c[idx] = encode_output[0, idx]

        #  Decoding parameters 2  Last 1 The output of a hidden layer   As   The second part of the decoder 1 Time step hidden layer input 
        decode_hidden = encode_hidden

        #  Decoding parameters 3  The start character of the first time step of the decoder 
        input_y = torch.tensor([[SOS_token]], device=device)

        # 3  Autoregressive decoding 
        #  Initialize predicted vocabulary list 
        decoded_words = []
        #  initialization attention tensor 
        decoder_attentions = torch.zeros(MAX_LENGTH, MAX_LENGTH)
        for idx in range(MAX_LENGTH): # note:MAX_LENGTH=10
            output_y, decode_hidden, attn_weights = my_attndecoderrnn(input_y, decode_hidden, encoder_outputs_c)
            #  The predicted value is used as the input value for the next time step 
            topv, topi = output_y.topk(1)
            decoder_attentions[idx] = attn_weights

            #  If the output value is a Terminator , Then the cycle stops 
            if topi.squeeze().item() == EOS_token:
                decoded_words.append('<EOS>')
                break
            else:
                decoded_words.append(french_index2word[topi.item()])

            #  Assign the index of this prediction to  input_y, Make the next time step prediction 
            input_y = topi.detach()

        #  Return results decoded_words,  Attention tensor weight distribution table ( Cut off the unused parts )
        return decoded_words, decoder_attentions[:idx + 1]

assessment

#  Load model 
PATH1 = './gpumodel/my_encoderrnn.pth'
PATH2 = './gpumodel/my_attndecoderrnn.pth'
def dm_test_Seq2Seq_Evaluate():
    #  Instantiation dataset object 
    mypairsdataset = MyPairsDataset(my_pairs)
    #  Instantiation dataloader
    mydataloader = DataLoader(dataset=mypairsdataset, batch_size=1, shuffle=True)

    #  Instantiation model 
    input_size = english_word_n
    hidden_size = 256  #  Observation data   You can use 8
    my_encoderrnn = EncoderRNN(input_size, hidden_size)
    # my_encoderrnn.load_state_dict(torch.load(PATH1))
    my_encoderrnn.load_state_dict(torch.load(PATH1, map_location=lambda storage, loc: storage), False)
    print('my_encoderrnn Model structure --->', my_encoderrnn)

    #  Instantiation model 
    input_size = french_word_n
    hidden_size = 256  #  Observation data   You can use 8
    my_attndecoderrnn = AttnDecoderRNN(input_size, hidden_size)
    # my_attndecoderrnn.load_state_dict(torch.load(PATH2))
    my_attndecoderrnn.load_state_dict(torch.load(PATH2, map_location=lambda storage, loc: storage), False)
    print('my_decoderrnn Model structure --->', my_attndecoderrnn)

    my_samplepairs = [['i m impressed with your french .', 'je suis impressionne par votre francais .'],
                      ['i m more than a friend .', 'je suis plus qu une amie .'],
                      ['she is beautiful like her mother .', 'vous gagnez n est ce pas ?']]
    print('my_samplepairs--->', len(my_samplepairs))

    for index, pair in enumerate(my_samplepairs):
        x = pair[0]
        y = pair[1]

        #  sample x  Text digitization 
        tmpx = [english_word2index[word] for word in x.split(' ')]
        tmpx.append(EOS_token)
        tensor_x = torch.tensor(tmpx, dtype=torch.long, device=device).view(1, -1)

        #  Model to predict 
        decoded_words, attentions = Seq2Seq_Evaluate(tensor_x, my_encoderrnn, my_attndecoderrnn)
        # print('decoded_words->', decoded_words)
        output_sentence = ' '.join(decoded_words)

        print('\n')
        print('>', x)
        print('=', y)
        print('<', output_sentence)

About server debugging python Explanation

Take the code as an example （py File for " translate .py"）, Don't spray , I thought that someone might have a problem naming things in Chinese , I hereby try , Solve the problem by the way . Do not program in Chinese ！ Do not program in Chinese ！ Do not program in Chinese ！

Knowledge points to know before operation

nohup

nohup The command is run by Command Parameters and any related Arg The command specified by the parameter , Ignore all hang ups （SIGHUP） The signal . Use... After logging out nohup Command to run programs in the background . To run... In the background nohup command , add to & （ Express “and” The symbol of ） To the end of the order .
Parameter description ：
Command： Commands to execute .
Arg： Some of the parameters , You can specify the output file .
&： Let the command be executed in the background , After the terminal exits, the command is still executed .

tail

tail The command can be used to view the contents of a file , There is a common parameter -f Often used to look up log files that are changing
Parameters ：
-f Cyclic reading
-q Do not display processing information
-v Display detailed processing information
-c< number > Number of bytes displayed
-n< Row number > Show the end of the file n Row content
–pid=PID And -f share , In progress ID,PID After death
-q, --quiet, --silent Never output the first part of a given filename
-s, --sleep-interval=S And -f share , It means to sleep at the interval of each repetition S second

First of all to enter python File directory and compile

Insert picture description here

To background process

nohup python -u  translate .py > run.log 2>&1 &

Start a SSH Connect to see the training output

Statement ： Excessive model generation takes time , If not, please wait patiently

tail -f run.log

Execution results

Insert picture description here
a

原网站

版权声明
本文为[GodGump]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/160/202206090614048218.html