当前位置：网站首页>Interpretation of ernie1.0 and ernie2.0 papers

Interpretation of ernie1.0 and ernie2.0 papers

2022-07-02 07:22:00 【lwgkzl】

executive summary

This article mainly introduces Baidu's ERNIE1.0 Model and ERNIE2.0 Model .

1. ERNIE1.0 Enhanced Representation through Knowledge Integration

Motivation

BERT That kind of prediction is random mask The method ignores the entity information in the sentence and the relationship between entities （ That is, external knowledge ）.

Model

Insert picture description here The training of the model is divided into three stages , Each stage uses different mask Strategy , Random respectively mask, Random mask Some phrases in the sentence , And random mask Entities in sentences . In the process of pre training , Make the model to predict the mask Dropped words , Phrases and entities to learn more comprehensive semantic information of sentences .

2. ERNIE2.0 A Continual Pre-Training Framework for Language Understanding

Motivation

The pre training task of previous pre training model modeling is based on the co-occurrence relationship between vocabulary and vocabulary . You can't learn words with complete sentences , Syntax and statement information . We can continue to mine a large number of pre training tasks to model this information , Such as the order of all sentences in the paragraph , Entities with special significance （ The person's name , Place names, etc ）.
For a large number of pre training tasks , With multi-task Can't dynamically add new pre training tasks , Therefore, it is not flexible . And those who can dynamically add new tasks continue learning Every pre training task will have the problem of task forgetting .
This paper proposes a model framework to comprehensively solve the above two problems . And on the basis of this framework , Several pre training tasks are proposed to mine the vocabulary of sentences , Grammatical and semantic information .

Model

Insert picture description here
The focus of the whole framework is in the lower right corner , You can simply understand the four story pyramid in the lower right corner , First train the task 1 A quarter of the data , Then training tasks 1 A quarter of the data and tasks 2 One third of the data . The third training task 1 Of 1/4, Mission 2 Of 1/3 And the mission 3 Of 1/2 The data of . And so on , Finally, all the data can be trained , The model gradually increases tasks in an iterative way , Ensure that new tasks can be added dynamically , At the same time, the old tasks will not be forgotten , And it does not increase the amount of calculation .

Based on this framework , This paper has explored many new pre training tasks ： They are the vocabulary sector , Grammatical structure level and semantic level .
Insert picture description here
among
word-aware Of and ernie1.0 Their training objectives are basically the same , In addition, a prediction of capitalized words is added （ Predict whether a word is capitalized ）, Because capitalized words usually have special meanings .

Structure-aware Your tasks are 1. Predict the relative position of all sentences in the paragraph 2. Judge whether the two sentences are in the same document .

Semantic-aware Your tasks are 1. Judge the relationship between entities （ Rough annotation with tools ） 2. Use user click data as weak supervision , Learn to ask about the relevance of documents （ If the user clicks, it is roughly counted as relevant ）

ps

ERNIE2.0 The proposed framework has very practical significance in industry , Because it has good scalability , Can continue to learn , And more weak supervision or self supervision information can be mined .

BTW, ERNIE2.0 The summary of is hard to say ummm

原网站

版权声明
本文为[lwgkzl]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/183/202207020622419258.html