当前位置:网站首页>Model building in pytorch
Model building in pytorch
2022-07-29 06:12:00 【Quinn-ntmy】
One 、 Two elements of building a model
- Building sub modules : In the model established by oneself ( Inherit nn.Module) Of
__init__()Method ; - Splice submodules : In the model
forward()In the method .
Two 、nn.Module class
In the model nn.Module : All our models , All network layers inherit from this class .torch.nn Include (1)nn.Parameter、(2)nn.functional、(3)nn.Module、(4)nn.init, These sub modules work together .
1.nn.Parameter
Zhang quantum class , Represents a learnable parameter , Such as weight、bias.
The parameters of the model need to be trained by the optimizer , Therefore, the parameter is usually set to requires_grad = True Tensor . meanwhile , In a model , There are many parameters , Manual management is not easy . Generally, the parameters are used as nn.Parameter Express , And use nn.Module To manage all the parameters under its structure .
Code example
Such as sub module Attention The learnable parameters in :
if score_function == 'mlp':
self.weight = nn.Parameter(torch.Tensor(hidden_dim*2))
elif self.score_function == 'bi_linear':
self.weight = nn.Parameter(torch.Tensor(hidden_dim, hidden_dim))
else: # dot_product / scaled_dot_product
self.register_parameter('weight', None)
self.reset_parameters()
In practice , Usually by inheritance nn.Module To build module classes , And put all the parts that contain the parameters that need to be learned in the constructor .
class AEN_BERT(nn.Module):
def __init__(self, bert, opt):
super(AEN_BERT, self).__init__()
self.opt = opt
self.bert = bert
self.squeeze_embedding = SqueezeEmbedding()
self.dropout = nn.Dropout(opt.dropout)
self.attn_k = Attention(opt.bert_dim, out_dim=opt.hidden_dim, n_head=8, score_function='mlp', dropout=opt.dropout)
self.attn_q = Attention(opt.bert_dim, out_dim=opt.hidden_dim, n_head=8, score_function='mlp', dropout=opt.dropout)
self.ffn_c = PositionwiseFeedForward(opt.hidden_dim, dropout=opt.dropout)
self.ffn_t = PositionwiseFeedForward(opt.hidden_dim, dropout=opt.dropout)
self.attn_s1 = Attention(opt.hidden_dim, n_head=8, score_function='mlp', dropout=opt.dropout)
self.dense = nn.Linear(opt.hidden_dim*3, opt.polarities_dim)
def forward(self, inputs):
context, target = inputs[0], inputs[1]
context_len = torch.sum(context != 0, dim=-1)
target_len = torch.sum(target != 0, dim=-1)
context = self.squeeze_embedding(context, context_len)
context, _ = self.bert(context, return_dict=False)
context = self.dropout(context)
target = self.squeeze_embedding(target, target_len)
target, _ = self.bert(target, return_dict=False)
target = self.dropout(target)
hc, _ = self.attn_k(context, context) # Introspective contextual word modeling
hc = self.ffn_c(hc) # Point by point convolution transform
ht, _ = self.attn_q(context, target) # Context aware target word modeling
ht = self.ffn_t(ht) # Point by point convolution transform
s1, _ = self.attn_s1(hc, ht) # The target specific context represents
# Average pooling in the paper ??average pooling The final representation of the output
hc_mean = torch.div(torch.sum(hc, dim=1), context_len.unsqueeze(1).float())
ht_mean = torch.div(torch.sum(ht, dim=1), target_len.unsqueeze(1).float())
s1_mean = torch.div(torch.sum(s1, dim=1), context_len.unsqueeze(1).float())
# torch.div(a, b ): tensor a And scalars b Do element by element division , Or two broadcast tensors a、b Do element by element division between
x = torch.cat((hc_mean, s1_mean, ht_mean), dim=-1) # concat Connect together
out = self.dense(x) # Use nn.Linear Fully connected layer
return out
You can see that the module class is built AEN_BERT, It includes sub modules Attention, The part of the model that contains the parameters that need to be learned is placed in the constructed function ( Sub module ) in .
2. nn.functionalnn.functional: The concrete realization of function . Such as :
(1) Activate function series (F.relu,F.sigmoid,F.tanh,F.softmax)
(2) Model layer series (F.linear,F.conv2d,F.max_pool2d,F.dropout2d,F.embedding)
(3) Loss function series (F.binary_cross_entropy,F.mse_loss,F.cross_entropy)
In order to facilitate the management of parameters , Usually by inheritance nn.Module Convert to the implementation form of class , And directly packaged in nn Under module :
(1) The activation function becomes (nn.Relu,nn.Sigmoid,nn.Tanh,nn.Softmax)
(2) The model layer (nn.Linear,nn.Conv2d,nn.Max_pool2d,nn.Dropout2d,nn.Embedding)
(3) Loss function (nn.BCELoss,nn.MSELoss,nn.CrossEntorpyLoss)
3. nn.Module
All network layer base classes , Manage the properties of the network .
stay nn.Module in , Yes 8 Important attributes , Used to manage the entire model , They all exist in the form of an ordered dictionary :
self._parameters: Dict[str, Optional[Parameter]] = OrderedDict()
self._buffers: Dict[str, Optional[Tensor]] = OrderedDict()
self._backward_hooks: Dict[int, Callable] = OrderedDict()
self._forward_hooks: Dict[int, Callable] = OrderedDict()
self._forward_pre_hooks: Dict[int, Callable] = OrderedDict()
self._state_dict_hooks: Dict[int, Callable] = OrderedDict()
self._load_state_dict_pre_hooks: Dict[int, Callable] = OrderedDict()
self._modules: Dict[str, Optional['Module']] = OrderedDict()
(1)_parameters: Storage management belongs to nn.Parameter Attributes of a class , for example weight、bias These parameters ;
(2)_modules: Storage management nn.Module class ;
(3)_buffers: Storage management buffer properties , Such as BN Layer. running_mean,std And so on will exist in this ;
(4)***_hooks: Storage management hook function (5 One and hooks About the dictionary ).nn.Module Mechanism for building attributes : First, there is a big Module Inherit nn.Module The base class , Like the one above AEN_BERT, And then this big one Module There can be many sub modules in it , These sub modules also inherit from nn.Module, In these Module Of __init__ In the method , It will first call the initialization method of the parent class 8 Initialization of properties .
And then when building each sub module , Divided into two steps , The first step is to initialize , Then be __setattr__ Methods by judging value Save it in the corresponding attribute dictionary , And then assign values to the corresponding members . Build sub modules one by one , Finally, the whole big Module Build complete .
summary :
- One Module It can contain more than one child module;
- One Module It's equivalent to an operation , Must be realized
forward()function ; - Every Module There are 8 A dictionary manages its properties ( The most common is
_parameters,_modules)
In general , We seldom use it directly nn.Parameter To define the parameters to build the model , But by assembling some common model layers . These model layers are also inherited from nn.Module The object of , It also includes parameters , Submodules that belong to the module we want to define .
nn.Module There are ways to manage these submodules :
children() Method : Return to generator , Including all sub modules under the module ;
named_children() Method : Return to a generator , Including all sub modules under the module , And their names ;
modules() Method : Return to a generator , It includes all the modules at all levels under the module , Including the module itself ;
named_modules() Method : Return to a generator , It includes all the modules at all levels under the module and their names , Including the module itself .
among children() Methods and named_children() The method is more used ,modules() Methods and named_modules() The method is less used , Its functions can be achieved through multiple named_children() The nested use implementation of .
4. nn.init: Parameter initialization method .
边栏推荐
- [target detection] generalized focal loss v1
- 【Transformer】AdaViT: Adaptive Tokens for Efficient Vision Transformer
- Discussion on the design of distributed full flash memory automatic test platform
- Transformer review + understanding
- 京微齐力:基于HMEP060的OLED字符显示(及FUXI工程建立演示)
- Wechat built-in browser prohibits caching
- 一、multiprocessing.pool.RemoteTraceback
- 华为云14天鸿蒙设备开发-Day3内核开发
- 一、多个txt文件合并成1个txt文件
- HAL库学习笔记-12 SPI
猜你喜欢

C connect to SharePoint online webservice

D3.js vertical relationship diagram (with arrows and text description of connecting lines)

Reading papers on false news detection (I): fake news detection using semi supervised graph revolutionary network

HAL学习笔记 - 7 定时器之高级定时器

ReportingService WebService form authentication

基于FPGA:多目标运动检测(手把手教学①)

第一周任务 深度学习和pytorch基础

华为云14天鸿蒙设备开发-Day3内核开发

第2周学习:卷积神经网络基础

tensorboard使用
随机推荐
基于STC51:四轴飞控开源项目原理图与源码(入门级DIY)
备份谷歌或其他浏览器插件
Typical cases of xdfs & China Daily Online Collaborative Editing Platform
HAL库学习笔记-11 I2C
HAL库学习笔记- 8 串口通信之概念
ML17-神经网络实战
第2周学习:卷积神经网络基础
Pytorch's data reading mechanism
2022春招——芯动科技FPGA岗技术面(一面心得)
QT学习笔记-QtSQL
AttributeError: module ‘tensorflow‘ has no attribute ‘placeholder‘
Typical case of xdfs & Aerospace Institute HPC cluster
ABSA1: Attentional Encoder Network for Targeted Sentiment Classification
虚假新闻检测论文阅读(二):Semi-Supervised Learning and Graph Neural Networks for Fake News Detection
Transfer feature learning with joint distribution adaptation
一、常见损失函数的用法
C connect to SharePoint online webservice
ML自学笔记5
1、 What is the difference between transfer learning and fine tuning?
6、 Pointer meter recognition based on deep learning key points