当前位置:网站首页>[2020 overview] overview of link prediction based on knowledge map embedding
[2020 overview] overview of link prediction based on knowledge map embedding
2022-07-01 04:40:00 【Necther】
source | Expertise

Abstract
Knowledge map (KGs) There are many applications in industry and academia , This, in turn, has driven a great deal of research towards extracting information from various sources on a large scale . Despite these efforts , But as we all know , Even the most advanced KGs It's also incomplete . Link to predict (Link Prediction, LP) It's a kind of basis KG The task of predicting missing facts by existing entities in , Is a promising 、 Extensively studied 、 To solve KG Incomplete task of . Recent LP In technology , be based on KG Embedded LP The technology has achieved good performance in some benchmark tests . Although the literature in this field is increasing rapidly , But the influence of various design choices in these methods has not attracted enough attention . Besides , The standard practice in this area is to report accuracy by testing a large number of facts , Some of these entities are over represented ; This allows LP Method shows good performance by modifying only the structural attributes that contain these entities , And ignore KG Major part . This overview analysis provides embedded based LP A comprehensive comparison of methods , Extend the dimension of analysis beyond the scope of common literature . We compared... Through experiments 16 The effectiveness and efficiency of the most advanced methods , Consider a rule-based benchmark , A detailed analysis of the most popular benchmarks in the literature is reported .
Introduce
Knowledge map (KGs) Is a structured representation of real-world information . In a KG in , Nodes represent entities , For example, people and places ; Tags are the relationship types that connect them ; An edge is a specific fact that connects two entities with a relationship . because KGs Ability to structure in a machine-readable manner 、 Modeling complex data , So it is widely used in various fields , From question and answer to information retrieval and content-based recommendation systems , And for any semantics web Projects are very important . common KG Yes FreeBase、WikiData、DBPedia、Yago And Industry KG There's Google KG、Satori and Facebook Graph Search. These are huge KG It can contain millions of entities and billions of facts . Despite such efforts , But as we all know , Even the most advanced KGs There are also problems of incompleteness . for example , According to observation FreeBase It is the largest and most widely used for research purposes KGs One of , But in FreeBase In more than 70% Of individuals have no place of birth , exceed 99% The individual has no nationality . This has led researchers to propose a variety of techniques to correct errors , And add the missing facts to KGs in , It is often called knowledge map completion or knowledge map enhancement task . You can do this by using an external source ( Such as Web corpus ) Extract new facts , Or from KG The fact that an existing fact in infers a missing fact , To grow existing KG. Later methods , It is called link prediction (LP), Is the focus of our analysis .LP It has been an increasingly active research field , Recently, it has benefited from the explosive growth of machine learning and deep learning technology . At present, the vast majority of LP The model uses the original KG Element to learn low dimensional representation , It is called knowledge map embedding , Then use them to infer new facts . In just a few years , The researchers were RESCAL and TransE And so on , Dozens of new models based on different architectures have been developed . Most papers in this field have one thing in common , But there are problems , That is, the results they report are summarized on a large number of test facts , Few of these entities are over represented . therefore ,LP The method can perform well on these benchmarks , Only these entities are accessed , And ignore other entities . Besides , The limitations of current best practices may make it difficult to understand how the papers in this literature are combined , And how to describe the research direction worth pursuing . besides , Advantages of current technology 、 Shortcomings and limitations remain unknown , in other words , Few studies have been done to allow the model to perform better . Roughly speaking , We still don't know what makes a fact easy or difficult to learn and predict . In order to alleviate the above problems , We have a representative group based on KG Embedded LP The models are widely compared and analyzed . We give priority to the most advanced systems , And consider work that belongs to a broad architecture . We train and adjust these systems from scratch , And by proposing new 、 Informative assessment practices , Provide experimental results beyond the original paper . The concrete is :
- We have considered 16 A model , Belong to different machine learning and deep learning architectures ; We also use an additional state-of-the-art based on rule mining LP Model as baseline . We provide a detailed description of the methods considered in the experimental comparison and a summary of the relevant literature , And the educational classification of knowledge map embedding technology .
- We have considered 5 The most commonly used data sets , And the most popular metrics currently used for benchmarking ; We analyzed their characteristics in detail .
- For each model , We provide quantitative results of efficiency and effectiveness for each data set .
- We propose a set of structural features in the training data , And measure how they affect the predictive performance of each model for each test fact .
Methods an overview
In this section , We describe and discuss the main methods of knowledge management based on latent characteristics . As in No 2 As described in section ,LP Models can utilize a variety of methods and architectures , It depends on how they model the optimization problem , And the techniques they implement to deal with optimization problems .
In order to outline their highly different characteristics , We propose a new classification , Pictured 1 Shown . We have listed three main series of models , And further divide them into smaller groups , Marked with unique colors . For each group , We all include the most effective representative models , Give priority to models that achieve the most advanced performance , And whenever possible , Prioritize models that have publicly available implementations . The result is a set of 16 A model , Based on an extremely diverse architecture ; These are the models we used later in the experimental part of comparative analysis . For each model , We also report the year of publication and information from other models . We think , This classification is helpful to understand these models and the experiments carried out in our work . surface 1 Further information about the included models is reported , For example, their loss function and spatial complexity . We have identified three types of models :1) Tensor decomposition model ;2) Geometric models ;3) Deep learning model .


Tensor decomposition model
The model of this family will LP Explain the task as a tensor decomposition . These models implicitly put KG Consider a three-dimensional adjacency matrix ( That is, a 3 D tensor ), because KG The incompleteness of , This adjacency matrix is only partially observable . The tensor is decomposed into a combination of low dimensional vectors ( For example, a multilinear product ): These vectors are used as embedded representations of entities and relationships . The core idea of tensor decomposition is , As long as the training set does not fit , Then the learned embedding should be able to generalize , And the high value is associated with the unobservable real facts in the graph adjacency matrix . In practice , The score of each fact is calculated by combining the specific embeddedness involved in the fact ; By optimizing the scoring function of all training facts , You can learn to embed as usual . These models tend to use little or no shared parameters ; This makes them particularly easy to train .
Geometric models
The geometric model interprets the relationship as a geometric transformation of the potential space . For a given fact , The head entity is embedded for space conversion τ, Use the embedded relationship as the value of the parameter . The value of the fact score is the distance between the result vector and the tail vector ; In this way, the distance function can be used to calculate δ( for example L1 and L2 norm ).

Deep learning model
The deep learning model uses deep neural networks to perform LP Mission . Neural network learning parameters , Such as weight and deviation , They combine input data , To identify significant patterns . Deep neural networks usually organize parameters into independent layers , Usually, nonlinear activation functions are interspersed .
as time goes on , People have developed many different types of layers , Apply different operations to the input data . for example , The full connection layer will put the input data X And weight W Combine , And add a deviation B: W X + B. For the sake of simplicity , We will not mention the use of deviation in the following formula , Keep it implicit . Higher layers perform more complex operations , Such as convolution ( It learns convolution kernel to apply to input data ) Or recursive layer ( Process sequential input recursively ).
stay LP Tasks , It is usually learned by combining the weights and deviations of each layer KG The embedded ; These shared parameters make these models more expressive , But it may lead to more parameters , Harder to train , It's easier to over fit .










边栏推荐
- 尺取法:有效三角形的个数
- MySQL winter vacation self-study 2022 12 (5)
- 2022年G1工业锅炉司炉特种作业证考试题库及在线模拟考试
- I also gave you the MySQL interview questions of Boda factory. If you need to come in and take your own
- What is uid? What is auth? What is a verifier?
- 嵌入式系統開發筆記80:應用Qt Designer進行主界面設計
- Custom components in applets
- Shell之Unix运维常用命令
- Shell之分析服务器日志命令集锦
- 嵌入式系统开发笔记79:为什么要获取本机网卡IP地址
猜你喜欢

Custom components in applets

Annual inventory review of Alibaba cloud's observable practices in 2021

CF1638E colorful operations
![[human version] Web3 privacy game in the dark forest](/img/89/e16789b7f3892002748aab309c45e6.png)
[human version] Web3 privacy game in the dark forest

Tencent has five years of testing experience. It came to the interview to ask for 30K, and saw the so-called software testing ceiling

2022年G1工业锅炉司炉特种作业证考试题库及在线模拟考试

Dual contractual learning: text classification via label aware data augmentation reading notes

(12) Somersault cloud case (navigation bar highlights follow)

2022 question bank and answers for safety production management personnel of hazardous chemical production units

Daily algorithm & interview questions, 28 days of special training in large factories - the 13th day (array)
随机推荐
Introduction of Spock unit test framework and its practice in meituan optimization___ Chapter I
嵌入式系统开发笔记79:为什么要获取本机网卡IP地址
【LeetCode】100. Same tree
How to view the changes and opportunities in the construction of smart cities?
206. reverse linked list
Note de développement du système embarqué 80: application du concepteur Qt à la conception de l'interface principale
Common interview questions ①
CUDA development and debugging tool
尺取法:有效三角形的个数
Measurement of quadrature axis and direct axis inductance of three-phase permanent magnet synchronous motor
JS rotation chart
MySQL winter vacation self-study 2022 12 (5)
[deep learning] (4) decoder mechanism in transformer, complete pytoch code attached
Difference between cookie and session
Applications and features of VR online exhibition
Threejs opening
LM小型可编程控制器软件(基于CoDeSys)笔记十九:报错does not match the profile of the target
Common UNIX Operation and maintenance commands of shell
什么是权限?什么是角色?什么是用户?
PgSQL failed to start after installation