当前位置:网站首页>ACL 2022 | small sample ner of sequence annotation: dual tower Bert model integrating tag semantics

ACL 2022 | small sample ner of sequence annotation: dual tower Bert model integrating tag semantics

2022-07-07 12:46:00 PaperWeekly

1375ec88a204d94fefa737d993097286.gif

author | SinGaln

This is an article from ACL 2022 The article , The general idea is meta-learning On the basis of , Adopt double towers BERT The model is used to match the text characters and the corresponding label Encoding , And carry out the two Dot Product( Point multiplication ) Get the output to do a classification . The article is not complicated on the whole , There are few formulas involved , It is easy to understand the author's ideas . For serial annotation NER It's a good idea .

8d39d04e7382977bc4d93aa8c9e74943.png

Paper title :

Label Semantics for Few Shot Named Entity Recognition

Thesis link :

https://arxiv.org/pdf/2203.08985.pdf

be6111595fa504647c3b8c99735702cc.png

Model


1.1 framework

94df462d832c6302cbb2c0afb7f6f45b.png

▲ chart 1. The overall framework of the model

You can see clearly from the picture above , The author adopts double towers BERT To separate the text Token And each Token Corresponding label Encoding . Here, the author's idea of adopting this method is also very simple , the reason being that Few-shot Mission , There's not enough data , So the author thinks that every Token Of label It can be for Token Provide additional semantic information .

The author's Meta-Learning It's using metric-based Method , An intuitive understanding is to calculate each sample first Token The vector representation of , Then compare with the calculated label Characterization calculation similarity , Here, from the Dot Product It can be intuitively reflected . Then the similarity matrix obtained ([batch_size,sequence_length,embed_dim]) Conduct softmax normalization , adopt argmax The function takes the one with the largest median in the last dimension index, And corresponding tag list , Get the present Token Corresponding label .

1.2 Detail

Besides , When the author characterizes the label , Each label is also processed accordingly , In general, it is divided into the following three steps : 

1. Change the abbreviation labels of words into natural language forms , for example PER-->person,ORG-->organization,LOC-->local wait ;

2. Start the label 、 The middle mark turns into natural language form , For example BIO If the form is marked, it can be changed into begin、inside、other wait , Other annotation forms are similar . 

3. Combine after conversion according to the method of the previous two steps , for example B-PER-->begin person,I-PER-->inside person.

Because what is going on is Few-shot NER Mission , So the author is in many source datasets The training model above , Then they're in multiple unseen few shot target datasets The above verification has passed fine-tuning And without going through fine-tuning The effect of the model .

It's going on Token When coding , For each adopt BERT The model can get its corresponding vector , As shown below :

de49ea2bd7e0c9b4d25420fffe78a72b.png

What needs to be noted here is BERT The output of the model is last_hidden_state As the corresponding Token Vector .

When encoding labels , Encode all tags in the tag set , Each complete label The obtained code is Part as its coding vector , And put all label Codes form a set of vectors , Finally, calculate each And The dot product , Form the following :

14234eaa88f1143777f95e71b4c9e054.png

Because it's used here label The way of coding representation , Compared with other NER Method , Encounter new data and label when , There is no need to initiate a new top-level classifier , To achieve Few-shot Purpose .

1.3 Label Transfer

In the article, the author also lists the label conversion table of the experimental data set , Part of it is as follows :

826071c916eede16f51a56131dddfb7a.png

▲ chart 2. Experimental data sets Label Transfer

1.4 Support Set Sampling Algorithm

The sampling pseudocode is as follows :

2db669812acac579272be77f969c7723.png

▲ chart 3.  Sampling pseudocode

ccb2c432813bca746c9fef9ecda4719f.png

experimental result

5e6dfdd651030e1c5dfe9e653a3e8886.png

▲ chart 4.  Some experimental results

From the experimental results , You can obviously feel this method in Few-shot It still has a good effect , stay 1-50 shot The effect of time model is better than that of other models , Shows label Semantic validity ; But in full data , This method is discounted , It shows that the larger the amount of data , The model is for label The less semantic dependency . Here, the author also has a point of view that under the full amount of data , The introduction of tag semantics in this way may slightly offset the original text semantics , Of course , That's the way it's said Few-shot Next is also established , It's just Few-shot The lower offset is a positive offset , It can enhance the generalization ability of the model , The offset under the full amount of data is a little overflowing . 

Two towers BERT Code implementation ( There is no metric-based Method ):

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# @Time    : 2022/5/23 13:49
# @Author  : SinGaln

import torch
import torch.nn as nn
from transformers import BertModel, BertPreTrainedModel


class SinusoidalPositionEmbedding(nn.Module):
    """ Definition Sin-Cos Location Embedding
    """

    def __init__(
            self, output_dim, merge_mode='add'):
        super(SinusoidalPositionEmbedding, self).__init__()
        self.output_dim = output_dim
        self.merge_mode = merge_mode

    def forward(self, inputs):
        input_shape = inputs.shape
        batch_size, seq_len = input_shape[0], input_shape[1]
        position_ids = torch.arange(seq_len, dtype=torch.float)[None]
        indices = torch.arange(self.output_dim // 2, dtype=torch.float)
        indices = torch.pow(10000.0, -2 * indices / self.output_dim)
        embeddings = torch.einsum('bn,d->bnd', position_ids, indices)
        embeddings = torch.stack([torch.sin(embeddings), torch.cos(embeddings)], dim=-1)
        embeddings = embeddings.repeat((batch_size, *([1] * len(embeddings.shape))))
        embeddings = torch.reshape(embeddings, (batch_size, seq_len, self.output_dim))
        if self.merge_mode == 'add':
            return inputs + embeddings.to(inputs.device)
        elif self.merge_mode == 'mul':
            return inputs * (embeddings + 1.0).to(inputs.device)
        elif self.merge_mode == 'zero':
            return embeddings.to(inputs.device)


class DoubleTownNER(BertPreTrainedModel):
    def __init__(self, config, num_labels, position=False):
        super(DoubleTownNER, self).__init__(config)
        self.position = position
        self.num_labels = num_labels
        self.bert = BertModel(config=config)
        self.fc = nn.Linear(config.hidden_size, self.num_labels)

        if self.position:
            self.sinposembed = SinusoidalPositionEmbedding(config.hidden_size, "add")

    def forward(self, sequence_input_ids, sequence_attention_mask, sequence_token_type_ids, label_input_ids,
                label_attention_mask, label_token_type_ids):
        #  Get text and labels encode
        # [batch_size, sequence_length, embed_dim]
        sequence_outputs = self.bert(input_ids=sequence_input_ids, attention_mask=sequence_attention_mask,
                                     token_type_ids=sequence_token_type_ids).last_hidden_state
        # [batch_size, embed_dim]
        label_outputs = self.bert(input_ids=label_input_ids, attention_mask=label_attention_mask,
                                  token_type_ids=label_token_type_ids).pooler_output
        label_outputs = label_outputs.unsqueeze(1)

        #  Position vector 
        if self.position:
            sequence_outputs = self.sinposembed(sequence_outputs)
        # Dot  Interaction 
        interactive_output = sequence_outputs * label_outputs
        # full-connection
        outputs = self.fc(interactive_output)
        return outputs

if __name__=="__main__":
    pretrain_path = "../bert_model"
    from transformers import BertConfig

    token_input_ids = torch.randint(1, 100, (32, 128))
    token_attention_mask = torch.ones_like(token_input_ids)
    token_token_type_ids = torch.zeros_like(token_input_ids)

    label_input_ids = torch.randint(1, 10, (1, 10))
    label_attention_mask = torch.ones_like(label_input_ids)
    label_token_type_ids = torch.zeros_like(label_input_ids)
    config = BertConfig.from_pretrained(pretrain_path)
    model = DoubleTownNER.from_pretrained(pretrain_path, config=config, num_labels=10, position=True)

    outs = model(sequence_input_ids=token_input_ids, sequence_attention_mask=token_attention_mask, sequence_token_type_ids=token_token_type_ids, label_input_ids=label_input_ids,
                label_attention_mask=label_attention_mask, label_token_type_ids=label_token_type_ids)
    print(outs, outs.size())

Read more

e5ca5024c949a0e1c3e8fdcbfeb9e310.png

e3f5863afa853d7e43eca8b10910f8a2.png

9d3f5c47a9a6fd16cd1cd0c1ebe4c933.png

2e76a44212df194aa32ad9fa2569bd6f.gif

# cast draft   through Avenue #

  Let your words be seen by more people  

How to make more high-quality content reach the reader group in a shorter path , How about reducing the cost of finding quality content for readers ? The answer is : People you don't know .

There are always people you don't know , Know what you want to know .PaperWeekly Maybe it could be a bridge , Push different backgrounds 、 Scholars and academic inspiration in different directions collide with each other , There are more possibilities . 

PaperWeekly Encourage university laboratories or individuals to , Share all kinds of quality content on our platform , It can be Interpretation of the latest paper , It can also be Analysis of academic hot spots Scientific research experience or Competition experience explanation etc. . We have only one purpose , Let knowledge really flow .

  The basic requirements of the manuscript :

• The article is really personal Original works , Not published in public channels , For example, articles published or to be published on other platforms , Please clearly mark  

• It is suggested that  markdown  Format writing , The pictures are sent as attachments , The picture should be clear , No copyright issues

• PaperWeekly Respect the right of authorship , And will be adopted for each original first manuscript , Provide Competitive remuneration in the industry , Specifically, according to the amount of reading and the quality of the article, the ladder system is used for settlement

  Contribution channel :

• Send email :[email protected] 

• Please note your immediate contact information ( WeChat ), So that we can contact the author as soon as we choose the manuscript

• You can also directly add Xiaobian wechat (pwbot02) Quick contribution , remarks : full name - contribute

3ab136861508c18fe3d0506f02752b13.png

△ Long press add PaperWeekly Small make up

Now? , stay 「 You know 」 We can also be found

Go to Zhihu home page and search 「PaperWeekly」

Click on 「 Focus on 」 Subscribe to our column

·

393010c4bab76ce15e7ca744c0e2e5b7.jpeg

原网站

版权声明
本文为[PaperWeekly]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207071032427741.html