当前位置:网站首页>How to use Bert
How to use Bert
2022-07-28 06:07:00 【Alan and fish】
1. Import Bert library
When I write code, I see a lot of code imported in the following way
from pytorch_pretrained_bert import BertTokenizer,BertModel
Some use transformer The way to import , So I sometimes feel a little depressed about which way to import .
from transformers import BertTokenizer,BertConfig,BertModel
According to this blogger's blog ,https://blog.csdn.net/qq_43391414/article/details/118252012
know transerformers Bag, also known as bag pytorch-transformers perhaps pytorch-pretrained-bert”
But according to some knowledge , actually transformers The library is the latest version ( Formerly known as pytorch-transformers and pytorch-pretrained-bert)
Therefore, it improves some functions and methods on the basis of the first two , Including some functions may only be in transformers The library can only be used , So use transformers The library is more convenient .
It offers a range of STOA( Most advanced ) Model implementation , Include (Bert、XLNet、RoBERTa etc. ).
So import bert The following methods are recommended for modeling
from transformers import BertTokenizer,BertModel
2.bert What kind of data format should be input into the model
The situation I encountered when writing code is , Some people's code directly divides sentences into words , Processing into id Enter the format into bert In the model , Some people want to process the data into input_ids,mask_attention,token… Various formats , All kinds of , As a little white, enter the field of deep learning , I don't feel very friendly . Data preprocessing , There are a thousand ways to write a person's code , I really don't know who to trust .
So I saw bert Source code of the model , I saw his foward function :
def forward(
self,
input_ids: Optional[torch.Tensor] = None,
attention_mask: Optional[torch.Tensor] = None,
token_type_ids: Optional[torch.Tensor] = None,
position_ids: Optional[torch.Tensor] = None,
head_mask: Optional[torch.Tensor] = None,
inputs_embeds: Optional[torch.Tensor] = None,
encoder_hidden_states: Optional[torch.Tensor] = None,
encoder_attention_mask: Optional[torch.Tensor] = None,
past_key_values: Optional[List[torch.FloatTensor]] = None,
use_cache: Optional[bool] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
) -> Union[Tuple[torch.Tensor], BaseModelOutputWithPoolingAndCrossAttentions]:
These are what we are going to use bert Input data format of the model :
1.input_ids:
- data type :torch.Tensor
- The content of the expression : after tokenizer After the participle subword Corresponding subscript list
2.attention_mask
- data type :torch.Tensor
- The content of the expression : stay self-attention In the process , This piece of mask Used to mark subword The sentence and padding The difference between , Use useful information 1 Express , take padding Partially filled with 0;
3.token_type_ids:
- data type :torch.Tensor
- The content of the expression : Mark subword The current sentence ( The first sentence / The second sentence / padding), If only one sentence is used 0 fill
4.position_ids:
- data type :torch.Tensor
- The content of the expression : Mark the position of the sentence where the current word is located ; use 1 Express padding The value of
5.head_mask:
- data type :torch.Tensor
- The content of the expression : Used to invalidate some attention calculations of some layers ;
6.inputs_embeds:
- data type :torch.Tensor
- The content of the expression : If provided , Then there is no need for input_ids, Across embedding lookup The process acts directly as Embedding Get into Encoder Calculation ;
7.encoder_hidden_states:
- data type :torch.Tensor
- The content of the expression : This part is in BertModel Configure to decoder It works , Will perform cross-attention instead of self-attention;
8.encoder_attention_mask:
- data type :torch.Tensor
- The content of the expression : This parameter seems to be pre calculated K-V The product is passed into , To reduce cross-attention The cost of ( Because originally this part was double counting );
9.past_key_values:
- data type :List[torch.FloatTensor]
- The content of the expression : This parameter seems to be pre calculated K-V The product is passed into , To reduce cross-attention The cost of ( Because originally this part was double counting );
10.use_cache:
- data type :bool
- The content of the expression : The last parameter will be saved and returned , Speed up decoding;
11.output_attentions:
- data type :bool
- The content of the expression : Whether to return to... Of each middle layer attention Output ;
12.output_hidden_states:
- data type :bool
- The content of the expression : Whether to return the output of each intermediate layer ;
13.return_dict:
- data type :bool
- The content of the expression : Whether the form of key value pair (ModelOutput class , It can also be used as tuple use ) Return output , Default to true .
====================================================
You can use Dataset First deal with the data , Put it in here , And then Dataset Put it in DataLoader in , Set the batch , Batch by batch loading data .
See my notes for detailed writing ,Dataset and DataLoader How to use .
3.Bert Model output
Input Bert In the model, just input ,input_ids,attention_mask,token_type_ids That's all right. , The following is just part of my code .
out = self.bert(x['input_ids'], x['attention_mask'], x['token_type_ids'])
Output after input out It includes the following four data
- last_hidden_state:
torch.FloatTensor Type of , The output of the sequence of the last hidden layer . Size is (batch_size, sequence_length, hidden_size) sequence_length Is the length of the sentence we intercepted ,hidden_size yes 768. - pooler_output:
torch.FloatTensor Type of ,[CLS] This token Output , The output size is (batch_size, hidden_size) - hidden_states :
tuple(torch.FloatTensor) This is an option for output , If the output , You need to specify the config.output_hidden_states=True, It is also a tuple , Its first element is embedding, The remaining elements are the output of each layer , The shape of each element is (batch_size, sequence_length, hidden_size) - attentions:
This is also an option for output , If the output , You need to specify the config.output_attentions=True, It is also a tuple , Its element is the attention weight of each layer , Used to calculate self-attention heads Weighted average of
边栏推荐
猜你喜欢

Centos7 installing MySQL

2: Why read write separation

Record the problems encountered in online capacity expansion server nochange: partition 1 is size 419428319. It cannot be grown

Use Python to encapsulate a tool class that sends mail regularly

There is a problem with MySQL paging

【2】 Redis basic commands and usage scenarios

Idempotent component

Micro service architecture cognition and service governance Eureka

matplotlib数据可视化

Tensorboard visualization
随机推荐
Invalid packaging for parent POM x, must be “pom“ but is “jar“ @
Bert based data preprocessing in NLP
Distinguish between real-time data, offline data, streaming data and batch data
【六】redis缓存策略
Micro service architecture cognition and service governance Eureka
Sorting and paging, multi table query after class exercise
3:Mysql 主从复制搭建
Single line function, aggregate function after class exercise
小程序商城制作一个需要多少钱?一般包括哪些费用?
【3】 Redis features and functions
At the moment of the epidemic, online and offline travelers are trapped. Can the digital collection be released?
分布式集群架构场景化解决方案:集群时钟同步问题
word2vec和bert的基本使用方法
Assembly packaging
self-attention学习笔记
微服务架构认知、服务治理-Eureka
Distributed cluster architecture scenario optimization solution: session sharing problem
小程序制作小程序开发适合哪些企业?
Trino function tag
项目不报错,正常运行,无法请求到服务