当前位置:网站首页>Thesis reading_ Tsinghua Ernie
Thesis reading_ Tsinghua Ernie
2022-07-03 04:43:00 【xieyan0811】
English title :ERNIE: Enhanced Language Representation with Informative Entities
Chinese title :ERNIE: Use information entities to enhance language representation
Address of thesis :https://arxiv.org/pdf/1905.07129v3/n
field : natural language processing
Time of publication :2019
author :Zhengyan Zhang, Tsinghua University
Source :ACL
Quantity cited :37
Code and data :https://github.com/thunlp/ERNIE
Reading time :2002.06.25
Journal entry
2019 Around the year, Tsinghua and Baidu both proposed the name ERNIE Model of , The same name , The method is different . Tsinghua's ERNIE hold Knowledge map is integrated into the vector of text Express , Also called KEPLM, The idea is more interesting , Model improvement effect : When using a small amount of data to train the model ,ERNIE Better than other models . From a technical point of view , It demonstrates Methods of integrating heterogeneous data .
Introduce
In this paper, ERNIE, It is a pre training language model combining knowledge map and large-scale data . The introduction of knowledge maps faces two important challenges :
- How to extract and represent the structure in knowledge graph in text representation
- Integrate heterogeneous data : Map the pre training model representation and knowledge graph representation to the same vector space
ERNIE The solution is as follows :
- Identify named entities mentioned in the text , And then The entity is aligned with the corresponding entity in the knowledge graph , Using text semantics as the entity embedding of knowledge graph , Reuse TransE Methods learn the structure of the graph .
- In terms of pre training language model , Also use similar BERT Of MLM Method , At the same time, use the alignment method , look for Mask the entities in the knowledge map ; Aggregate context and knowledge graph to jointly predict token And entities .
Method
Defining symbols
token( The smallest unit of operation : Usually words or words ) Use {w1,…,wn} Express , Aligned Entity use {e1,…, em} Express . We need to pay attention to m And n Generally, the number is different , An entity may contain more than one word . Definition V To include all token The vocabulary of , All entities in the knowledge graph use E Express . Use functions f(w)=e Indicates the alignment function , The first of entities is used in this paper token alignment .
Model structure
The structure of the model is shown in the figure -2 Shown :

The model structure consists of two ,T-Encoder For extraction token Relevant text information ;K-Encoder Integrated extended graph information , Transform heterogeneous data into a unified space .
First , Will make use of token {w1,…, wn} Words embedded in 、 Segment embedding 、 Position insertion , Plug in T-Encoder layer , Calculate its semantic features :

T-Encoder Similar to ordinary BERT, It consists of N individual Transformer layers , In bold {e1,…, em} Said by TransE Pre trained graph embedding , Bold w and e Plug in K-Encoder, Integrate heterogeneous data , Generate output wo and eo:

wo and eo Will be used for downstream tasks .
Knowledge coding
From the picture -2 You can see the right half of ,K-Encoder Generally including M layer , By the end of i Layer as an example , The input is number i-1 Layer of w and e, Use two multi headed self-attention.

about token:wj And with it alignment Entity of :ek=f(wj), Use the following methods to fuse data :

there hj Is the inner hidden layer , It is a combination of token And entity representation ,σ It's a nonlinear activation function , Use here GELU. For those who cannot find the corresponding entity token, No need to merge :

The first i The simplified representation of the layer is as follows :

Use the pre training model to inject knowledge
In pre training , Random Mask aligned token-entity, Let the model predict the corresponding multiple token. This process is similar to self encoder dEA. Knowledge map may contain many entities , do softmax The amount of calculation is very large , And we only focus on the entities needed by the system , To reduce the amount of calculation . In the given token Sequence and entity sequence , Define alignment distribution calculation :

It counts in w Under the condition of , Align entities to ej Probability , type (7) Used to calculate the cross entropy loss function .
stay 5% Under the circumstances , Replace the entity with another entity , Correct with training model token Alignment error with entity ; stay 15% Under the circumstances , shelter token Alignment with entities , Correct the alignment failure with the training model ; Keep alignment in other cases , Study token Relationship with entities .
The loss function of training synthesizes dEA( Self coding ),MLM( shelter ) and NSP( Sentence order ) The loss of .
Fine tune the model for specific tasks
Pictured -3 Shown :

For general tasks , Embed the encoded words into the downstream model . For knowledge driven tasks , For example, relationship classification , Or predict the entity type , Use the following methods to fine tune .
For the problem of relationship classification , The most direct method is to add a pool layer after the output entity vector , Concatenate entity pairs , Then send it to the classifier . The method proposed in this paper is shown in Figure -3 Shown , It labels the front and back of the head entity and the tail entity respectively , The effect of tags is similar to the position embedding in traditional relationship classification , Still use CLS To mark categories .
The prediction entity type is a simplified version of the relationship classification , Also used ENT Tags to guide the model to combine context information and entity information .
experiment
Tsinghua's ERNIE It is a model for English training , Experimental proof , Additional knowledge can help the model make full use of small training data , This is very useful for many tasks with limited data .
边栏推荐
- 论文阅读_ICD编码_MSMN
- Auman Galaxy new year of the tiger appreciation meeting was held in Beijing - won the double certification of "intelligent safety" and "efficient performance" of China Automotive Research Institute
- 雇佣收银员(差分约束)
- Know that Chuangyu cloud monitoring - scanv Max update: Ecology OA unauthorized server request forgery and other two vulnerabilities can be detected
- 怎么用Kotlin去提高生产力:Kotlin Tips
- SSM based campus part-time platform for College Students
- Career planning of counter attacking College Students
- Games101 Lesson 9 shading 3 Notes
- Shell script -- condition judgment
- What functions need to be set after the mall system is built
猜你喜欢

data2vec! New milestone of unified mode

Php+mysql registration landing page development complete code

Career planning of counter attacking College Students

After reviewing MySQL for a month, I was stunned when the interviewer of Alibaba asked me

Leetcode simple question: the key with the longest key duration

逆袭大学生的职业规划

String matching: find a substring in a string
![[luatos sensor] 1 light sensing bh1750](/img/70/07f29e072c0b8630f92ec837fc12d5.jpg)
[luatos sensor] 1 light sensing bh1750

"Niuke brush Verilog" part II Verilog advanced challenge

Smart contract security audit company selection analysis and audit report resources download - domestic article
随机推荐
I've seen a piece of code in the past. I don't know what I'm doing. I can review it when I have time
第十九届浙江省 I. Barbecue
JVM原理简介
并发操作-内存交互操作
[Thesis Writing] how to write the overall design of JSP tourism network
[set theory] binary relationship (special relationship type | empty relationship | identity relationship | global relationship | divisive relationship | size relationship)
After reviewing MySQL for a month, I was stunned when the interviewer of Alibaba asked me
Web - Information Collection
[XSS bypass - protection strategy] understand the protection strategy and better bypass
Human resource management system based on JSP
How to choose cross-border e-commerce multi merchant system
Games101 Lesson 9 shading 3 Notes
JS multidimensional array to one-dimensional array
Handling record of electric skateboard detained by traffic police
7. Integrated learning
[PCL self study: filtering] introduction and use of various filters in PCL (continuously updated)
2022 registration of G2 utility boiler stoker examination and G2 utility boiler stoker reexamination examination
【工具跑SQL盲注】
Internationalization and localization, dark mode and dark mode in compose
How to retrieve the password for opening word files