当前位置:网站首页>Torch. NN. Parameter() function understanding
Torch. NN. Parameter() function understanding
2022-07-29 06:11:00 【Quinn-ntmy】
Use PyTorch When training neural networks , In essence, it is equivalent to training a function , input data Through this function Output a prediction , And we give the structure of this function ( Like convolution 、 Full connection, etc ) after , can Study Of this function Parameters 了 .
therefore , You can put torch.nn.Parameter() As a type conversion function , Will be a non trainable type Tensor Convert to a training type parameter, And will the parameter Bind to this module Inside , After type conversion, this self.v Become part of the model , Become Parameters in the model that can be changed according to training .
Use torch.nn.Parameter() The purpose of is to let some variables constantly modify their own values in the process of learning to achieve optimization .
in application , Design a loss function , With gradient descent method , It makes the neural network we learned complete the prediction task more accurately .
Classic application scenarios include :
- Weight parameter in attention mechanism ( Basically )
as follows ( Intercepted from a text classification task ):
# Definition Attention
# here ,Attention The input is sent_hiddens and sent_masks.
class Attention(nn.Module):
def __init__(self, hidden_size):
super(Attention, self).__init__()
self.weight = nn.Parameter(torch.Tensor(hidden_size, hidden_size)) # The weight of attention mechanism is set as a learnable parameter
self.weight.data.normal_(mean=0.0, std=0.05) # Returns the tensor of a random number , The tensor is from a given mean mean And standard deviation std Extracted from the independent normal distribution of
# I hope so tensor To operate , But not by autograd Record , Use tensor.data or tensor.detach()
self.bias = nn.Parameter(torch.Tensor(hidden_size)) # Offset learnable
b = np.zeros(hidden_size, dtype=np.float32)
self.bias.data.copy_(torch.from_numpy(b)) # torch.from_numpy(b) Put the array b Convert to tensor , Memory sharing and both
self.query = nn.Parameter(torch.Tensor(hidden_size))
self.query.data.normal_(mean=0.0, std=0.05)
def forward(self, batch_hidden, batch_masks):
# linear
# 1、 Input sent_hiddens First, through linear change, we get key, The dimensions remain the same (batch_size, doc_len, 512)
key = torch.matmul(batch_hidden, self.weight) + self.bias # b * len * hidden
# compute attention
# 2、 then key and query Multiply , obtain outputs. This is what we need attention, Represents the weight assigned to each sentence
outputs = torch.matmul(key, self.query) # batch_size * doc_len
# according to query and key Calculate the similarity or correlation between the two , The basic method is vector product matmul.
masked_outputs = outputs.masked_fill((1-batch_masks).bool(), float(-1e32))
# Where there will be no words (0) Replace with a smaller value ( such as -1e32) namely padding_mask
# 3、 do softmax operation , Get the attention weight matrix
attn_scores = F.softmax(masked_outputs, dim=1) # b * len
# 4、 Use another input sent_masks, Reset the sentence right without words to -1e32, obtain masked_attn_scores
masked_attn_scores = attn_scores.masked_fill((1-batch_masks).bool(), 0.0)
# sum weighted sources
# 5、 take masked_attn_scores and key Multiply , obtain batch_outputs, The shape is (batch_size, 512)
batch_outputs = torch.bmm(masked_attn_scores.unsqueeze(1), key).squeeze(1) # b * hidden # To sum by weight
return batch_outputs, attn_scores
边栏推荐
- 迁移学习——Transitive Transfer Learning
- 华为云14天鸿蒙设备开发-Day1环境搭建
- ML15 neural network (1)
- Jianzhi core taocloud full flash SDS helps build high-performance cloud services
- 2022春招——禾赛科技FPGA技术岗(一、二面,收集于:数字IC打工人及FPGA探索者)
- [semantic segmentation] setr_ Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformer
- 【Transformer】ATS: Adaptive Token Sampling For Efficient Vision Transformers
- 【Transformer】AdaViT: Adaptive Tokens for Efficient Vision Transformer
- 三、如何读取视频?
- ML15-神经网络(1)
猜你喜欢
第三周周报 ResNet+ResNext
ML自学笔记5
Migration learning - geodesic flow kernel for unsupervised domain adaptation
How to perform POC in depth with full flash distribution?
Jianzhi core taocloud full flash SDS helps build high-performance cloud services
[network design] convnext:a convnet for the 2020s
备份谷歌或其他浏览器插件
研究生新生培训第二周:卷积神经网络基础
第一周任务 深度学习和pytorch基础
tensorboard使用
随机推荐
迁移学习——Transitive Transfer Learning
NLP领域的AM模型
HAL库学习笔记-14 ADC和DAC
Anr Optimization: cause oom crash and corresponding solutions
The third week of postgraduate freshman training: resnet+resnext
研究生新生培训第三周:ResNet+ResNeXt
迁移学习——Low-Rank Transfer Subspace Learning
预训练语言模型的使用方法
1、 Combine multiple txt files into one TXT file
Yum local source production
QT学习笔记-Qt Model/View
HR面必问问题——如何与HR斗志斗勇(收集于FPGA探索者)
研究生新生培训第一周:深度学习和pytorch基础
基于STC51:四轴飞控开源项目原理图与源码(入门级DIY)
3、 How to customize data sets?
[image classification] how to use mmclassification to train your classification model
ML15-神经网络(1)
Transformer回顾+理解
【Transformer】ACMix:On the Integration of Self-Attention and Convolution
京微齐力:基于HMEP060的心率血氧模块开发(1:FPGA发送多位指令)