当前位置:网站首页>Torch. NN. Parameter() function understanding
Torch. NN. Parameter() function understanding
2022-07-29 06:11:00 【Quinn-ntmy】
Use PyTorch When training neural networks , In essence, it is equivalent to training a function , input data Through this function Output a prediction , And we give the structure of this function ( Like convolution 、 Full connection, etc ) after , can Study Of this function Parameters 了 .
therefore , You can put torch.nn.Parameter() As a type conversion function , Will be a non trainable type Tensor Convert to a training type parameter, And will the parameter Bind to this module Inside , After type conversion, this self.v Become part of the model , Become Parameters in the model that can be changed according to training .
Use torch.nn.Parameter() The purpose of is to let some variables constantly modify their own values in the process of learning to achieve optimization .
in application , Design a loss function , With gradient descent method , It makes the neural network we learned complete the prediction task more accurately .
Classic application scenarios include :
- Weight parameter in attention mechanism ( Basically )
as follows ( Intercepted from a text classification task ):
# Definition Attention
# here ,Attention The input is sent_hiddens and sent_masks.
class Attention(nn.Module):
def __init__(self, hidden_size):
super(Attention, self).__init__()
self.weight = nn.Parameter(torch.Tensor(hidden_size, hidden_size)) # The weight of attention mechanism is set as a learnable parameter
self.weight.data.normal_(mean=0.0, std=0.05) # Returns the tensor of a random number , The tensor is from a given mean mean And standard deviation std Extracted from the independent normal distribution of
# I hope so tensor To operate , But not by autograd Record , Use tensor.data or tensor.detach()
self.bias = nn.Parameter(torch.Tensor(hidden_size)) # Offset learnable
b = np.zeros(hidden_size, dtype=np.float32)
self.bias.data.copy_(torch.from_numpy(b)) # torch.from_numpy(b) Put the array b Convert to tensor , Memory sharing and both
self.query = nn.Parameter(torch.Tensor(hidden_size))
self.query.data.normal_(mean=0.0, std=0.05)
def forward(self, batch_hidden, batch_masks):
# linear
# 1、 Input sent_hiddens First, through linear change, we get key, The dimensions remain the same (batch_size, doc_len, 512)
key = torch.matmul(batch_hidden, self.weight) + self.bias # b * len * hidden
# compute attention
# 2、 then key and query Multiply , obtain outputs. This is what we need attention, Represents the weight assigned to each sentence
outputs = torch.matmul(key, self.query) # batch_size * doc_len
# according to query and key Calculate the similarity or correlation between the two , The basic method is vector product matmul.
masked_outputs = outputs.masked_fill((1-batch_masks).bool(), float(-1e32))
# Where there will be no words (0) Replace with a smaller value ( such as -1e32) namely padding_mask
# 3、 do softmax operation , Get the attention weight matrix
attn_scores = F.softmax(masked_outputs, dim=1) # b * len
# 4、 Use another input sent_masks, Reset the sentence right without words to -1e32, obtain masked_attn_scores
masked_attn_scores = attn_scores.masked_fill((1-batch_masks).bool(), 0.0)
# sum weighted sources
# 5、 take masked_attn_scores and key Multiply , obtain batch_outputs, The shape is (batch_size, 512)
batch_outputs = torch.bmm(masked_attn_scores.unsqueeze(1), key).squeeze(1) # b * hidden # To sum by weight
return batch_outputs, attn_scores
边栏推荐
- fastText学习——文本分类
- Yum local source production
- ML16 neural network (2)
- clion+opencv+aruco+cmake配置
- 华为云14天鸿蒙设备开发-Day7WIFI功能开发
- Jianzhi core taocloud full flash SDS helps build high-performance cloud services
- Continue the new journey and control smart storage together
- Typical cases of xdfs & China Daily Online Collaborative Editing Platform
- 【Transformer】SOFT: Softmax-free Transformer with Linear Complexity
- 2021-06-10
猜你喜欢

clion+opencv+aruco+cmake配置

Wechat applet source code acquisition (download with tools)

【Transformer】ACMix:On the Integration of Self-Attention and Convolution
![[semantic segmentation] full attention network for semantic segmentation](/img/5b/e5143701d60bc16a1ec620b03edbb3.png)
[semantic segmentation] full attention network for semantic segmentation

华为云14天鸿蒙设备开发-Day3内核开发

入门到入魂:单片机如何利用TB6600高精度控制步进电机(42/57)

Discussion on the design of distributed full flash memory automatic test platform

Wechat built-in browser prohibits caching
![[convolution kernel design] scaling up your kernels to 31x31: revising large kernel design in CNN](/img/71/f3fdf677cd5fddefffd4715e747297.png)
[convolution kernel design] scaling up your kernels to 31x31: revising large kernel design in CNN

电脑视频暂停再继续,声音突然变大
随机推荐
ROS教程(Xavier)
研究生新生培训第一周:深度学习和pytorch基础
Transfer feature learning with joint distribution adaptation
MarkDown简明语法手册
ML11-SKlearn实现支持向量机
How to perform POC in depth with full flash distribution?
2021-06-10
3、 How to customize data sets?
1、 What is the difference between transfer learning and fine tuning?
研究生新生培训第三周:ResNet+ResNeXt
Ml17 neural network practice
tensorflow中tf.get_variable()函数详解
2022春招——上海安路FPGA岗面经(以及乐鑫SOC面试)
[target detection] KL loss: bounding box progression with uncertainty for accurate object detection
【Transformer】TransMix: Attend to Mix for Vision Transformers
【Transformer】SOFT: Softmax-free Transformer with Linear Complexity
MySQL inserts millions of data (using functions and stored procedures)
Migration learning robot visual domain adaptation with low rank reconstruction
2、 During OCR training, txt files and picture data are converted to LMDB file format
【Transformer】SOFT: Softmax-free Transformer with Linear Complexity