当前位置:网站首页>Interpretation of ernie1.0 and ernie2.0 papers
Interpretation of ernie1.0 and ernie2.0 papers
2022-07-02 07:22:00 【lwgkzl】
executive summary
This article mainly introduces Baidu's ERNIE1.0 Model and ERNIE2.0 Model .
1. ERNIE1.0 Enhanced Representation through Knowledge Integration
Motivation
BERT That kind of prediction is random mask The method ignores the entity information in the sentence and the relationship between entities ( That is, external knowledge ).
Model
The training of the model is divided into three stages , Each stage uses different mask Strategy , Random respectively mask, Random mask Some phrases in the sentence , And random mask Entities in sentences . In the process of pre training , Make the model to predict the mask Dropped words , Phrases and entities to learn more comprehensive semantic information of sentences .
2. ERNIE2.0 A Continual Pre-Training Framework for Language Understanding
Motivation
The pre training task of previous pre training model modeling is based on the co-occurrence relationship between vocabulary and vocabulary . You can't learn words with complete sentences , Syntax and statement information . We can continue to mine a large number of pre training tasks to model this information , Such as the order of all sentences in the paragraph , Entities with special significance ( The person's name , Place names, etc ).
For a large number of pre training tasks , With multi-task Can't dynamically add new pre training tasks , Therefore, it is not flexible . And those who can dynamically add new tasks continue learning Every pre training task will have the problem of task forgetting .
This paper proposes a model framework to comprehensively solve the above two problems . And on the basis of this framework , Several pre training tasks are proposed to mine the vocabulary of sentences , Grammatical and semantic information .
Model

The focus of the whole framework is in the lower right corner , You can simply understand the four story pyramid in the lower right corner , First train the task 1 A quarter of the data , Then training tasks 1 A quarter of the data and tasks 2 One third of the data . The third training task 1 Of 1/4, Mission 2 Of 1/3 And the mission 3 Of 1/2 The data of . And so on , Finally, all the data can be trained , The model gradually increases tasks in an iterative way , Ensure that new tasks can be added dynamically , At the same time, the old tasks will not be forgotten , And it does not increase the amount of calculation .
Based on this framework , This paper has explored many new pre training tasks : They are the vocabulary sector , Grammatical structure level and semantic level .
among
word-aware Of and ernie1.0 Their training objectives are basically the same , In addition, a prediction of capitalized words is added ( Predict whether a word is capitalized ), Because capitalized words usually have special meanings .
Structure-aware Your tasks are 1. Predict the relative position of all sentences in the paragraph 2. Judge whether the two sentences are in the same document .
Semantic-aware Your tasks are 1. Judge the relationship between entities ( Rough annotation with tools ) 2. Use user click data as weak supervision , Learn to ask about the relevance of documents ( If the user clicks, it is roughly counted as relevant )
ps
ERNIE2.0 The proposed framework has very practical significance in industry , Because it has good scalability , Can continue to learn , And more weak supervision or self supervision information can be mined .
BTW, ERNIE2.0 The summary of is hard to say ummm
边栏推荐
- Oracle EBS DataGuard setup
- Ding Dong, here comes the redis om object mapping framework
- Two table Association of pyspark in idea2020 (field names are the same)
- Oracle apex 21.2 installation and one click deployment
- Practice and thinking of offline data warehouse and Bi development
- Sqli-labs customs clearance (less2-less5)
- 【信息检索导论】第二章 词项词典与倒排记录表
- 使用 Compose 实现可见 ScrollBar
- ORACLE EBS ADI 开发步骤
- 使用Matlab实现:幂法、反幂法(原点位移)
猜你喜欢

MySQL无order by的排序规则因素

Only the background of famous universities and factories can programmers have a way out? Netizen: two, big factory background is OK

JSP智能小区物业管理系统

Oracle 11g uses ords+pljson to implement JSON_ Table effect

MapReduce与YARN原理解析

Sqli labs customs clearance summary-page1

软件开发模式之敏捷开发(scrum)

Sqli-labs customs clearance (less2-less5)

Three principles of architecture design

Network security -- intrusion detection of emergency response
随机推荐
CRP implementation methodology
【信息检索导论】第七章搜索系统中的评分计算
一个中年程序员学习中国近代史的小结
图解Kubernetes中的etcd的访问
Feeling after reading "agile and tidy way: return to origin"
How to call WebService in PHP development environment?
Agile development of software development pattern (scrum)
離線數倉和bi開發的實踐和思考
Oracle 11g uses ords+pljson to implement JSON_ Table effect
【MEDICAL】Attend to Medical Ontologies: Content Selection for Clinical Abstractive Summarization
第一个快应用(quickapp)demo
ORACLE EBS 和 APEX 集成登录及原理分析
Sqli-labs customs clearance (less2-less5)
How to efficiently develop a wechat applet
Three principles of architecture design
Ceaspectuss shipping company shipping artificial intelligence products, anytime, anywhere container inspection and reporting to achieve cloud yard, shipping company intelligent digital container contr
Oracle EBS interface development - quick generation of JSON format data
一份Slide两张表格带你快速了解目标检测
mapreduce概念和案例(尚硅谷学习笔记)
Oracle EBS数据库监控-Zabbix+zabbix-agent2+orabbix