当前位置:网站首页>Torch. NN. Parameter() function understanding
Torch. NN. Parameter() function understanding
2022-07-29 06:11:00 【Quinn-ntmy】
Use PyTorch When training neural networks , In essence, it is equivalent to training a function , input data Through this function Output a prediction , And we give the structure of this function ( Like convolution 、 Full connection, etc ) after , can Study Of this function Parameters 了 .
therefore , You can put torch.nn.Parameter() As a type conversion function , Will be a non trainable type Tensor Convert to a training type parameter, And will the parameter Bind to this module Inside , After type conversion, this self.v Become part of the model , Become Parameters in the model that can be changed according to training .
Use torch.nn.Parameter() The purpose of is to let some variables constantly modify their own values in the process of learning to achieve optimization .
in application , Design a loss function , With gradient descent method , It makes the neural network we learned complete the prediction task more accurately .
Classic application scenarios include :
- Weight parameter in attention mechanism ( Basically )
as follows ( Intercepted from a text classification task ):
# Definition Attention
# here ,Attention The input is sent_hiddens and sent_masks.
class Attention(nn.Module):
def __init__(self, hidden_size):
super(Attention, self).__init__()
self.weight = nn.Parameter(torch.Tensor(hidden_size, hidden_size)) # The weight of attention mechanism is set as a learnable parameter
self.weight.data.normal_(mean=0.0, std=0.05) # Returns the tensor of a random number , The tensor is from a given mean mean And standard deviation std Extracted from the independent normal distribution of
# I hope so tensor To operate , But not by autograd Record , Use tensor.data or tensor.detach()
self.bias = nn.Parameter(torch.Tensor(hidden_size)) # Offset learnable
b = np.zeros(hidden_size, dtype=np.float32)
self.bias.data.copy_(torch.from_numpy(b)) # torch.from_numpy(b) Put the array b Convert to tensor , Memory sharing and both
self.query = nn.Parameter(torch.Tensor(hidden_size))
self.query.data.normal_(mean=0.0, std=0.05)
def forward(self, batch_hidden, batch_masks):
# linear
# 1、 Input sent_hiddens First, through linear change, we get key, The dimensions remain the same (batch_size, doc_len, 512)
key = torch.matmul(batch_hidden, self.weight) + self.bias # b * len * hidden
# compute attention
# 2、 then key and query Multiply , obtain outputs. This is what we need attention, Represents the weight assigned to each sentence
outputs = torch.matmul(key, self.query) # batch_size * doc_len
# according to query and key Calculate the similarity or correlation between the two , The basic method is vector product matmul.
masked_outputs = outputs.masked_fill((1-batch_masks).bool(), float(-1e32))
# Where there will be no words (0) Replace with a smaller value ( such as -1e32) namely padding_mask
# 3、 do softmax operation , Get the attention weight matrix
attn_scores = F.softmax(masked_outputs, dim=1) # b * len
# 4、 Use another input sent_masks, Reset the sentence right without words to -1e32, obtain masked_attn_scores
masked_attn_scores = attn_scores.masked_fill((1-batch_masks).bool(), 0.0)
# sum weighted sources
# 5、 take masked_attn_scores and key Multiply , obtain batch_outputs, The shape is (batch_size, 512)
batch_outputs = torch.bmm(masked_attn_scores.unsqueeze(1), key).squeeze(1) # b * hidden # To sum by weight
return batch_outputs, attn_scores
边栏推荐
- 二、深度学习数据增强方法汇总
- 二、如何保存MNIST数据集中train和test的图片?
- 虚假新闻检测论文阅读(二):Semi-Supervised Learning and Graph Neural Networks for Fake News Detection
- 一、迁移学习与fine-tuning有什么区别?
- 2、 Summary of deep learning data enhancement methods
- 2、 Multi concurrent interface pressure test
- Low rank transfer subspace learning
- Change! Change! Change!
- Beijing Baode & taocloud jointly build the road of information innovation
- 1、 Pytorch Cookbook (common code Collection)
猜你喜欢

入门到入魂:单片机如何利用TB6600高精度控制步进电机(42/57)

Transfer learning

第三周周报 ResNet+ResNext

Typical cases of xdfs & China Daily Online Collaborative Editing Platform

迁移学习——Transfer Joint Matching for Unsupervised Domain Adaptation

2022春招——芯动科技FPGA岗技术面(一面心得)

电力电子:单项逆变器设计(MATLAB程序+AD原理图)

ML6自学笔记

Improve quality with intelligence financial imaging platform solution

【Transformer】ACMix:On the Integration of Self-Attention and Convolution
随机推荐
Low rank transfer subspace learning
[semantic segmentation] full attention network for semantic segmentation
一、multiprocessing.pool.RemoteTraceback
虚假新闻检测论文阅读(二):Semi-Supervised Learning and Graph Neural Networks for Fake News Detection
PyTorch中的模型构建
HAL库学习笔记-14 ADC和DAC
[semantic segmentation] setr_ Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformer
Transfer learning
Is flutter being quietly abandoned? On the future of flutter
Typical case of xdfs & Aerospace Institute HPC cluster
迁移学习—Geodesic Flow Kernel for Unsupervised Domain Adaptation
【Transformer】SOFT: Softmax-free Transformer with Linear Complexity
Torch. NN. Embedding() details
ABSA1: Attentional Encoder Network for Targeted Sentiment Classification
ML10自学笔记-SVM
iSCSI vs iSER vs NVMe-TCP vs NVMe-RDMA
CNOOC, desktop cloud & network disk storage system application case
四、One-hot和损失函数的应用
MarkDown简明语法手册
2、 How to save the images of train and test in MNIST dataset?