当前位置：网站首页>Torch. NN. Parameter() function understanding

Torch. NN. Parameter() function understanding

2022-07-29 06:11:00 【Quinn-ntmy】

Use PyTorch When training neural networks , In essence, it is equivalent to training a function , input data Through this function Output a prediction , And we give the structure of this function （ Like convolution 、 Full connection, etc ） after , can Study Of this function Parameters 了 .

therefore , You can put torch.nn.Parameter() As a type conversion function , Will be a non trainable type Tensor Convert to a training type parameter, And will the parameter Bind to this module Inside , After type conversion, this self.v Become part of the model , Become Parameters in the model that can be changed according to training .

Use torch.nn.Parameter() The purpose of is to let some variables constantly modify their own values in the process of learning to achieve optimization .

in application , Design a loss function , With gradient descent method , It makes the neural network we learned complete the prediction task more accurately .

Classic application scenarios include ：

Weight parameter in attention mechanism （ Basically ）
as follows （ Intercepted from a text classification task ）：

#  Definition Attention
#  here ,Attention The input is sent_hiddens and sent_masks.
class Attention(nn.Module):
    def __init__(self, hidden_size):
        super(Attention, self).__init__()
        self.weight = nn.Parameter(torch.Tensor(hidden_size, hidden_size))  #  The weight of attention mechanism is set as a learnable parameter 
        self.weight.data.normal_(mean=0.0, std=0.05)  #  Returns the tensor of a random number , The tensor is from a given mean mean And standard deviation std Extracted from the independent normal distribution of 
        #  I hope so tensor To operate , But not by autograd Record , Use tensor.data or tensor.detach()

        self.bias = nn.Parameter(torch.Tensor(hidden_size))   #  Offset learnable 
        b = np.zeros(hidden_size, dtype=np.float32)
        self.bias.data.copy_(torch.from_numpy(b))   # torch.from_numpy(b)  Put the array b Convert to tensor , Memory sharing and both 

        self.query = nn.Parameter(torch.Tensor(hidden_size))
        self.query.data.normal_(mean=0.0, std=0.05)

    def forward(self, batch_hidden, batch_masks):
        # linear
        # 1、 Input sent_hiddens First, through linear change, we get key, The dimensions remain the same  (batch_size, doc_len, 512)
        key = torch.matmul(batch_hidden, self.weight) + self.bias    # b * len * hidden

        # compute attention
        # 2、 then key and query Multiply , obtain outputs.  This is what we need attention, Represents the weight assigned to each sentence 
        outputs = torch.matmul(key, self.query)  # batch_size * doc_len
        #  according to query and key Calculate the similarity or correlation between the two , The basic method is vector product matmul.

        masked_outputs = outputs.masked_fill((1-batch_masks).bool(), float(-1e32))
        #  Where there will be no words (0) Replace with a smaller value （ such as  -1e32）  namely padding_mask

        # 3、 do softmax operation , Get the attention weight matrix 
        attn_scores = F.softmax(masked_outputs, dim=1)  # b * len

        # 4、 Use another input sent_masks, Reset the sentence right without words to -1e32, obtain masked_attn_scores
        masked_attn_scores = attn_scores.masked_fill((1-batch_masks).bool(), 0.0)

        # sum weighted sources
        # 5、 take masked_attn_scores and key Multiply , obtain batch_outputs, The shape is (batch_size, 512)
        batch_outputs = torch.bmm(masked_attn_scores.unsqueeze(1), key).squeeze(1)   # b * hidden #  To sum by weight 

        return batch_outputs, attn_scores

原网站

版权声明
本文为[Quinn-ntmy]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/210/202207290519491064.html

当前位置：网站首页>Torch. NN. Parameter() function understanding

Torch. NN. Parameter() function understanding

边栏推荐

猜你喜欢

随机推荐