当前位置:网站首页>Torch. NN. Parameter() function understanding
Torch. NN. Parameter() function understanding
2022-07-29 06:11:00 【Quinn-ntmy】
Use PyTorch When training neural networks , In essence, it is equivalent to training a function , input data Through this function Output a prediction , And we give the structure of this function ( Like convolution 、 Full connection, etc ) after , can Study Of this function Parameters 了 .
therefore , You can put torch.nn.Parameter() As a type conversion function , Will be a non trainable type Tensor Convert to a training type parameter, And will the parameter Bind to this module Inside , After type conversion, this self.v Become part of the model , Become Parameters in the model that can be changed according to training .
Use torch.nn.Parameter() The purpose of is to let some variables constantly modify their own values in the process of learning to achieve optimization .
in application , Design a loss function , With gradient descent method , It makes the neural network we learned complete the prediction task more accurately .
Classic application scenarios include :
- Weight parameter in attention mechanism ( Basically )
as follows ( Intercepted from a text classification task ):
# Definition Attention
# here ,Attention The input is sent_hiddens and sent_masks.
class Attention(nn.Module):
def __init__(self, hidden_size):
super(Attention, self).__init__()
self.weight = nn.Parameter(torch.Tensor(hidden_size, hidden_size)) # The weight of attention mechanism is set as a learnable parameter
self.weight.data.normal_(mean=0.0, std=0.05) # Returns the tensor of a random number , The tensor is from a given mean mean And standard deviation std Extracted from the independent normal distribution of
# I hope so tensor To operate , But not by autograd Record , Use tensor.data or tensor.detach()
self.bias = nn.Parameter(torch.Tensor(hidden_size)) # Offset learnable
b = np.zeros(hidden_size, dtype=np.float32)
self.bias.data.copy_(torch.from_numpy(b)) # torch.from_numpy(b) Put the array b Convert to tensor , Memory sharing and both
self.query = nn.Parameter(torch.Tensor(hidden_size))
self.query.data.normal_(mean=0.0, std=0.05)
def forward(self, batch_hidden, batch_masks):
# linear
# 1、 Input sent_hiddens First, through linear change, we get key, The dimensions remain the same (batch_size, doc_len, 512)
key = torch.matmul(batch_hidden, self.weight) + self.bias # b * len * hidden
# compute attention
# 2、 then key and query Multiply , obtain outputs. This is what we need attention, Represents the weight assigned to each sentence
outputs = torch.matmul(key, self.query) # batch_size * doc_len
# according to query and key Calculate the similarity or correlation between the two , The basic method is vector product matmul.
masked_outputs = outputs.masked_fill((1-batch_masks).bool(), float(-1e32))
# Where there will be no words (0) Replace with a smaller value ( such as -1e32) namely padding_mask
# 3、 do softmax operation , Get the attention weight matrix
attn_scores = F.softmax(masked_outputs, dim=1) # b * len
# 4、 Use another input sent_masks, Reset the sentence right without words to -1e32, obtain masked_attn_scores
masked_attn_scores = attn_scores.masked_fill((1-batch_masks).bool(), 0.0)
# sum weighted sources
# 5、 take masked_attn_scores and key Multiply , obtain batch_outputs, The shape is (batch_size, 512)
batch_outputs = torch.bmm(masked_attn_scores.unsqueeze(1), key).squeeze(1) # b * hidden # To sum by weight
return batch_outputs, attn_scores
边栏推荐
猜你喜欢

零基础学FPGA(五):时序逻辑电路设计之计数器(附有呼吸灯实验、简单组合逻辑设计介绍)

ROS常用指令

基于STM32:情侣互动玩偶(设计方案+源码+3D图纸+AD电路)
![[semantic segmentation] full attention network for semantic segmentation](/img/5b/e5143701d60bc16a1ec620b03edbb3.png)
[semantic segmentation] full attention network for semantic segmentation

NLP领域的AM模型

GAN:生成对抗网络 Generative Adversarial Networks

华为云14天鸿蒙设备开发-Day5驱动子系统开发

基于STC51:四轴飞控开源项目原理图与源码(入门级DIY)

Ml17 neural network practice

ML15-神经网络(1)
随机推荐
[tensorrt] convert pytorch into deployable tensorrt
基于FPGA:多目标运动检测(手把手教学①)
研究生新生培训第二周:卷积神经网络基础
二、多并发实现接口压力测试
[target detection] 6. SSD
Beijing Baode & taocloud jointly build the road of information innovation
ML8自学笔记
迁移学习——Transfer Joint Matching for Unsupervised Domain Adaptation
一、网页端文件流的传输
ROS教程(Xavier)
2022春招——禾赛科技FPGA技术岗(一、二面,收集于:数字IC打工人及FPGA探索者)
HAL库学习笔记-13 I2C和SPI的应用
Discussion on the design of distributed full flash memory automatic test platform
ML6自学笔记
【Transformer】SOFT: Softmax-free Transformer with Linear Complexity
1、 Combine multiple txt files into one TXT file
ML15-神经网络(1)
迁移学习笔记——Adapting Component Analysis
[convolution kernel design] scaling up your kernels to 31x31: revising large kernel design in CNN
备份谷歌或其他浏览器插件