当前位置：网站首页>Wonderful! MarkBERT

Wonderful! MarkBERT

2022-07-01 11:10:00 【kaiyuan_ sjtu】

author | Prince Changqin
Arrangement | NewBeeNLP

Hello everyone , Here is NewBeeNLP. Today, let's watch an article about the cooperation between Tencent and Fudan University ：MarkBERT: Marking Word Boundaries Improves Chinese BERT^[1]

A word summary ： stay Token Add the boundary mark of the word you are interested in .

MarkBERT Not based on words BERT, Still based on words , But cleverly 「 Boundary markers of words 」 Information integration model . In this way, any word can be handled uniformly , Whether or not OOV. in addition ,MarkBERT There are two additional benefits ：

First , It is convenient to add word level learning goals to the boundary markers , This is a supplement to the traditional character and sentence level pre training tasks ;
secondly , You can use POS Tag specific tags replace generic tags to easily incorporate richer semantics .

stay NER The task has achieved 2 A point of improvement , In text categorization 、 Keyword recognition 、 Better accuracy has also been achieved in semantic similarity tasks .

This simple but effective Chinese pre training model MarkBERT, Word information is considered but not OOV problem . It has the following advantages ：

Deal with common words and low-frequency words in a unified way , No, OOV problem .
Marker The introduction of allows the design of word level pre training tasks , This is the word level MLM And sentence level NSP A supplement to .
Easy to expand and add more word semantics （ The part of speech 、 Morphology, etc ）.

There are two tasks in the pre training stage ：

MLM： Yes Marker It also went on MASK, So that the model can learn boundary knowledge .
Alternative word detection ： Manually replace a word , Then let the model distinguish whether the words in front of the mark are correct .

MarkBERT Preliminary training

MarkBERT

As shown in the figure below ：

First participle , Insert special marks between words , These marks will also be treated as ordinary characters . There's a place , Will also be MASK, In this way, we need to pay attention to the boundary of words when encoding , Instead of simply filling ,MASK Prediction tasks become more challenging （ Prediction requires a better understanding of word boundaries ）. such , The model is still character level , But it knows the boundaries of words （ Because the information of words is given explicitly ）.

Alternative word detection

To be specific , When a word is replaced by a confused word , Marks should be made 「 Be replaced 」 The forecast , The label is False, Otherwise True.

This loss function will be combined with MLM The loss functions of are added together as a multi task training process . Confused words come from synonyms or words with similar pronunciation , Through this mission , Tags can be more sensitive to the span of words in the context . Use POS The marked model is called MarkBERT-POS.

Preliminary training

MASK The proportion of is still 15%,30% The time of does not insert any marks （ The original BERT）;50% Time to execute WWM Prediction task ; The rest of the time MLM Prediction task .

In the insertion mark ,30% Replace words with pronunciation based confusion words or synonym based confusion words , Markers predict pronunciation confusion markers or synonym confusion markers ; Other time markers predict normal word markers . To avoid unbalanced labels , Only calculate the normal mark 15% The loss of .

experiment

stay NER The effect on the task is shown in the following table ：

You can see , The effect is obviously improved .

Ablation experiments were done on three tasks ：

MarkBERT-MLM： Only MLM Mission
MarkBERT-rwd： During substitution detection , Remove homophones or synonyms respectively
MarkBERT-w/o： Remove during the fine-tuning of downstream tasks Marker（ And primitive BERT The same usage ）

The results are shown in the table below ：

The conclusion is as follows ：

MarkBERT-MLM stay NER Get a significant improvement in the task , Explain that word boundary information is very important in fine-grained tasks .
Do not insert tags ,MarkBERT-w/o Also reached and baseline Similar effect , explain MarkBERT Can be like BERT The use of .
Yes NER In terms of tasks , Insertion marks are still important , indicate MarkBERT Structure is effective in learning the word boundaries of tasks that require this fine-grained representation .

Discuss

Existing Chinese BERT There are two strategies to integrate word information ：

Use word information in the pre training stage , But use character sequences on downstream tasks , Such as Chinese-BERT-WWM,Lattice-BERT.
Use word information when using the pre training model in downstream tasks , Such as WoBERT,AmBERT,Lichee.

In addition, in relation to entities NLU Mission , In particular, the idea of inserting tags is discussed in relation classification . Given a subject entity and an object entity , Existing work injects non type tags or entity specific tags , And make better predictions about the relationship between entities .

This paper was really good when it was brushed , The method is simple but ingenious , Solved the Chinese pre training model at once 「 word 」 To deal with , It is very convenient to introduce word level tasks , And the rich meaning of words . Actually , We can even aim at 「 Some words of interest 」 Add tag , The rest is still processed by word .

Communicate together

I want to learn and progress with you ！『NewBeeNLP』 At present, many communication groups in different directions have been established （ machine learning / Deep learning / natural language processing / Search recommendations / Figure network / Interview communication / etc. ）, Quota co., LTD. , Quickly add the wechat below to join the discussion and exchange ！（ Pay attention to it o want Notes Can pass ）