当前位置：网站首页>How to use Bert

How to use Bert

2022-07-28 06:07:00 【Alan and fish】

1. Import Bert library

When I write code, I see a lot of code imported in the following way

from pytorch_pretrained_bert import BertTokenizer,BertModel

Some use transformer The way to import , So I sometimes feel a little depressed about which way to import .

from transformers import BertTokenizer,BertConfig,BertModel

According to this blogger's blog ,https://blog.csdn.net/qq_43391414/article/details/118252012
know transerformers Bag, also known as bag pytorch-transformers perhaps pytorch-pretrained-bert”
But according to some knowledge , actually transformers The library is the latest version （ Formerly known as pytorch-transformers and pytorch-pretrained-bert）

Therefore, it improves some functions and methods on the basis of the first two , Including some functions may only be in transformers The library can only be used , So use transformers The library is more convenient .

It offers a range of STOA（ Most advanced ） Model implementation , Include (Bert、XLNet、RoBERTa etc. ).
So import bert The following methods are recommended for modeling

from transformers import BertTokenizer,BertModel

2.bert What kind of data format should be input into the model

The situation I encountered when writing code is , Some people's code directly divides sentences into words , Processing into id Enter the format into bert In the model , Some people want to process the data into input_ids,mask_attention,token… Various formats , All kinds of , As a little white, enter the field of deep learning , I don't feel very friendly . Data preprocessing , There are a thousand ways to write a person's code , I really don't know who to trust .
So I saw bert Source code of the model , I saw his foward function :

def forward(
        self,
        input_ids: Optional[torch.Tensor] = None,
        attention_mask: Optional[torch.Tensor] = None,
        token_type_ids: Optional[torch.Tensor] = None,
        position_ids: Optional[torch.Tensor] = None,
        head_mask: Optional[torch.Tensor] = None,
        inputs_embeds: Optional[torch.Tensor] = None,
        encoder_hidden_states: Optional[torch.Tensor] = None,
        encoder_attention_mask: Optional[torch.Tensor] = None,
        past_key_values: Optional[List[torch.FloatTensor]] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple[torch.Tensor], BaseModelOutputWithPoolingAndCrossAttentions]:

These are what we are going to use bert Input data format of the model :

1.input_ids:
- data type :torch.Tensor
- The content of the expression : after tokenizer After the participle subword Corresponding subscript list
2.attention_mask
- data type :torch.Tensor
- The content of the expression : stay self-attention In the process , This piece of mask Used to mark subword The sentence and padding The difference between , Use useful information 1 Express , take padding Partially filled with 0;
3.token_type_ids:
- data type :torch.Tensor
- The content of the expression : Mark subword The current sentence （ The first sentence / The second sentence / padding）, If only one sentence is used 0 fill
4.position_ids:
- data type :torch.Tensor
- The content of the expression : Mark the position of the sentence where the current word is located ; use 1 Express padding The value of
5.head_mask:
- data type :torch.Tensor
- The content of the expression : Used to invalidate some attention calculations of some layers ;
6.inputs_embeds:
- data type :torch.Tensor
- The content of the expression : If provided , Then there is no need for input_ids, Across embedding lookup The process acts directly as Embedding Get into Encoder Calculation ;
7.encoder_hidden_states:
- data type :torch.Tensor
- The content of the expression : This part is in BertModel Configure to decoder It works , Will perform cross-attention instead of self-attention;
8.encoder_attention_mask:
- data type :torch.Tensor
- The content of the expression : This parameter seems to be pre calculated K-V The product is passed into , To reduce cross-attention The cost of （ Because originally this part was double counting ）;
9.past_key_values:
- data type :List[torch.FloatTensor]
- The content of the expression : This parameter seems to be pre calculated K-V The product is passed into , To reduce cross-attention The cost of （ Because originally this part was double counting ）;
10.use_cache:
- data type :bool
- The content of the expression : The last parameter will be saved and returned , Speed up decoding;
11.output_attentions:
- data type :bool
- The content of the expression : Whether to return to... Of each middle layer attention Output ;
12.output_hidden_states:
- data type :bool
- The content of the expression : Whether to return the output of each intermediate layer ;
13.return_dict:
- data type :bool
- The content of the expression : Whether the form of key value pair （ModelOutput class , It can also be used as tuple use ） Return output , Default to true .

====================================================
You can use Dataset First deal with the data , Put it in here , And then Dataset Put it in DataLoader in , Set the batch , Batch by batch loading data .
See my notes for detailed writing ,Dataset and DataLoader How to use .

3.Bert Model output

Input Bert In the model, just input ,input_ids,attention_mask,token_type_ids That's all right. , The following is just part of my code .

out = self.bert(x['input_ids'], x['attention_mask'], x['token_type_ids'])

Output after input out It includes the following four data

last_hidden_state：
torch.FloatTensor Type of , The output of the sequence of the last hidden layer . Size is (batch_size, sequence_length, hidden_size) sequence_length Is the length of the sentence we intercepted ,hidden_size yes 768.
pooler_output：
torch.FloatTensor Type of ,[CLS] This token Output , The output size is (batch_size, hidden_size)
hidden_states ：
tuple(torch.FloatTensor) This is an option for output , If the output , You need to specify the config.output_hidden_states=True, It is also a tuple , Its first element is embedding, The remaining elements are the output of each layer , The shape of each element is (batch_size, sequence_length, hidden_size)
attentions：
This is also an option for output , If the output , You need to specify the config.output_attentions=True, It is also a tuple , Its element is the attention weight of each layer , Used to calculate self-attention heads Weighted average of

原网站

版权声明
本文为[Alan and fish]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/209/202207280518168360.html