当前位置:网站首页>Thesis reading_ Tsinghua Ernie
Thesis reading_ Tsinghua Ernie
2022-07-03 04:43:00 【xieyan0811】
English title :ERNIE: Enhanced Language Representation with Informative Entities
Chinese title :ERNIE: Use information entities to enhance language representation
Address of thesis :https://arxiv.org/pdf/1905.07129v3/n
field : natural language processing
Time of publication :2019
author :Zhengyan Zhang, Tsinghua University
Source :ACL
Quantity cited :37
Code and data :https://github.com/thunlp/ERNIE
Reading time :2002.06.25
Journal entry
2019 Around the year, Tsinghua and Baidu both proposed the name ERNIE Model of , The same name , The method is different . Tsinghua's ERNIE hold Knowledge map is integrated into the vector of text Express , Also called KEPLM, The idea is more interesting , Model improvement effect : When using a small amount of data to train the model ,ERNIE Better than other models . From a technical point of view , It demonstrates Methods of integrating heterogeneous data .
Introduce
In this paper, ERNIE, It is a pre training language model combining knowledge map and large-scale data . The introduction of knowledge maps faces two important challenges :
- How to extract and represent the structure in knowledge graph in text representation
- Integrate heterogeneous data : Map the pre training model representation and knowledge graph representation to the same vector space
ERNIE The solution is as follows :
- Identify named entities mentioned in the text , And then The entity is aligned with the corresponding entity in the knowledge graph , Using text semantics as the entity embedding of knowledge graph , Reuse TransE Methods learn the structure of the graph .
- In terms of pre training language model , Also use similar BERT Of MLM Method , At the same time, use the alignment method , look for Mask the entities in the knowledge map ; Aggregate context and knowledge graph to jointly predict token And entities .
Method
Defining symbols
token( The smallest unit of operation : Usually words or words ) Use {w1,…,wn} Express , Aligned Entity use {e1,…, em} Express . We need to pay attention to m And n Generally, the number is different , An entity may contain more than one word . Definition V To include all token The vocabulary of , All entities in the knowledge graph use E Express . Use functions f(w)=e Indicates the alignment function , The first of entities is used in this paper token alignment .
Model structure
The structure of the model is shown in the figure -2 Shown :

The model structure consists of two ,T-Encoder For extraction token Relevant text information ;K-Encoder Integrated extended graph information , Transform heterogeneous data into a unified space .
First , Will make use of token {w1,…, wn} Words embedded in 、 Segment embedding 、 Position insertion , Plug in T-Encoder layer , Calculate its semantic features :

T-Encoder Similar to ordinary BERT, It consists of N individual Transformer layers , In bold {e1,…, em} Said by TransE Pre trained graph embedding , Bold w and e Plug in K-Encoder, Integrate heterogeneous data , Generate output wo and eo:

wo and eo Will be used for downstream tasks .
Knowledge coding
From the picture -2 You can see the right half of ,K-Encoder Generally including M layer , By the end of i Layer as an example , The input is number i-1 Layer of w and e, Use two multi headed self-attention.

about token:wj And with it alignment Entity of :ek=f(wj), Use the following methods to fuse data :

there hj Is the inner hidden layer , It is a combination of token And entity representation ,σ It's a nonlinear activation function , Use here GELU. For those who cannot find the corresponding entity token, No need to merge :

The first i The simplified representation of the layer is as follows :

Use the pre training model to inject knowledge
In pre training , Random Mask aligned token-entity, Let the model predict the corresponding multiple token. This process is similar to self encoder dEA. Knowledge map may contain many entities , do softmax The amount of calculation is very large , And we only focus on the entities needed by the system , To reduce the amount of calculation . In the given token Sequence and entity sequence , Define alignment distribution calculation :

It counts in w Under the condition of , Align entities to ej Probability , type (7) Used to calculate the cross entropy loss function .
stay 5% Under the circumstances , Replace the entity with another entity , Correct with training model token Alignment error with entity ; stay 15% Under the circumstances , shelter token Alignment with entities , Correct the alignment failure with the training model ; Keep alignment in other cases , Study token Relationship with entities .
The loss function of training synthesizes dEA( Self coding ),MLM( shelter ) and NSP( Sentence order ) The loss of .
Fine tune the model for specific tasks
Pictured -3 Shown :

For general tasks , Embed the encoded words into the downstream model . For knowledge driven tasks , For example, relationship classification , Or predict the entity type , Use the following methods to fine tune .
For the problem of relationship classification , The most direct method is to add a pool layer after the output entity vector , Concatenate entity pairs , Then send it to the classifier . The method proposed in this paper is shown in Figure -3 Shown , It labels the front and back of the head entity and the tail entity respectively , The effect of tags is similar to the position embedding in traditional relationship classification , Still use CLS To mark categories .
The prediction entity type is a simplified version of the relationship classification , Also used ENT Tags to guide the model to combine context information and entity information .
experiment
Tsinghua's ERNIE It is a model for English training , Experimental proof , Additional knowledge can help the model make full use of small training data , This is very useful for many tasks with limited data .
边栏推荐
- Prefix and (continuously updated)
- RSRS index timing and large and small disc rotation
- [set theory] binary relationship (binary relationship notation | binary relationship from a to B | number of binary relationships | example of binary relationship)
- 论文阅读_ICD编码_MSMN
- JVM原理简介
- Symbol of array element product of leetcode simple problem
- The least operation of leetcode simple problem makes the array increment
- 移动端——uniapp开发记录(公共请求request封装)
- Dive Into Deep Learning——2.1数据操作&&练习
- UiPath实战(08) - 选取器(Selector)
猜你喜欢

2022 new examination questions for the main principals of hazardous chemical business units and examination skills for the main principals of hazardous chemical business units

Auman Galaxy new year of the tiger appreciation meeting was held in Beijing - won the double certification of "intelligent safety" and "efficient performance" of China Automotive Research Institute
![[USACO 2009 Dec S]Music Notes](/img/e6/282a8820becdd24d63dcff1b81fcaf.jpg)
[USACO 2009 Dec S]Music Notes

关于开学的准备与专业认知

MC Layer Target

FISCO bcos zero knowledge proof Fiat Shamir instance source code

移动端——uniapp开发记录(公共请求request封装)

Introduction to JVM principle

论文阅读_中文医疗模型_ eHealth

Smart contract security audit company selection analysis and audit report resources download - domestic article
随机推荐
2.14 summary
Market status and development prospect prediction of the global fire hose industry in 2022
Network security textual research recommendation
Wine travel Jianghu War: Ctrip is strong, meituan is strong, and Tiktok is fighting
[set theory] binary relationship (definition field | value field | inverse operation | inverse synthesis operation | restriction | image | single root | single value | nature of synthesis operation)
Mount NFS in kubesphere
Number of uniform strings of leetcode simple problem
Kubernetes source code analysis (I)
关于开学的准备与专业认知
FISCO bcos zero knowledge proof Fiat Shamir instance source code
Ffmpeg tanscoding transcoding
【PHP漏洞-弱类型】基础知识、php弱相等、报错绕过
2022-02-13 (347. Top k high frequency elements)
JS multidimensional array to one-dimensional array
Shell script Basics - basic grammar knowledge
[SQL injection] joint query (the simplest injection method)
4 years of experience to interview test development, 10 minutes to end, ask too
The least operation of leetcode simple problem makes the array increment
The reason why the entity class in the database is changed into hump naming
Youdao cloud notes