当前位置:网站首页>[2020 overview] overview of link prediction based on knowledge map embedding
[2020 overview] overview of link prediction based on knowledge map embedding
2022-07-01 04:40:00 【Necther】
source | Expertise

Abstract
Knowledge map (KGs) There are many applications in industry and academia , This, in turn, has driven a great deal of research towards extracting information from various sources on a large scale . Despite these efforts , But as we all know , Even the most advanced KGs It's also incomplete . Link to predict (Link Prediction, LP) It's a kind of basis KG The task of predicting missing facts by existing entities in , Is a promising 、 Extensively studied 、 To solve KG Incomplete task of . Recent LP In technology , be based on KG Embedded LP The technology has achieved good performance in some benchmark tests . Although the literature in this field is increasing rapidly , But the influence of various design choices in these methods has not attracted enough attention . Besides , The standard practice in this area is to report accuracy by testing a large number of facts , Some of these entities are over represented ; This allows LP Method shows good performance by modifying only the structural attributes that contain these entities , And ignore KG Major part . This overview analysis provides embedded based LP A comprehensive comparison of methods , Extend the dimension of analysis beyond the scope of common literature . We compared... Through experiments 16 The effectiveness and efficiency of the most advanced methods , Consider a rule-based benchmark , A detailed analysis of the most popular benchmarks in the literature is reported .
Introduce
Knowledge map (KGs) Is a structured representation of real-world information . In a KG in , Nodes represent entities , For example, people and places ; Tags are the relationship types that connect them ; An edge is a specific fact that connects two entities with a relationship . because KGs Ability to structure in a machine-readable manner 、 Modeling complex data , So it is widely used in various fields , From question and answer to information retrieval and content-based recommendation systems , And for any semantics web Projects are very important . common KG Yes FreeBase、WikiData、DBPedia、Yago And Industry KG There's Google KG、Satori and Facebook Graph Search. These are huge KG It can contain millions of entities and billions of facts . Despite such efforts , But as we all know , Even the most advanced KGs There are also problems of incompleteness . for example , According to observation FreeBase It is the largest and most widely used for research purposes KGs One of , But in FreeBase In more than 70% Of individuals have no place of birth , exceed 99% The individual has no nationality . This has led researchers to propose a variety of techniques to correct errors , And add the missing facts to KGs in , It is often called knowledge map completion or knowledge map enhancement task . You can do this by using an external source ( Such as Web corpus ) Extract new facts , Or from KG The fact that an existing fact in infers a missing fact , To grow existing KG. Later methods , It is called link prediction (LP), Is the focus of our analysis .LP It has been an increasingly active research field , Recently, it has benefited from the explosive growth of machine learning and deep learning technology . At present, the vast majority of LP The model uses the original KG Element to learn low dimensional representation , It is called knowledge map embedding , Then use them to infer new facts . In just a few years , The researchers were RESCAL and TransE And so on , Dozens of new models based on different architectures have been developed . Most papers in this field have one thing in common , But there are problems , That is, the results they report are summarized on a large number of test facts , Few of these entities are over represented . therefore ,LP The method can perform well on these benchmarks , Only these entities are accessed , And ignore other entities . Besides , The limitations of current best practices may make it difficult to understand how the papers in this literature are combined , And how to describe the research direction worth pursuing . besides , Advantages of current technology 、 Shortcomings and limitations remain unknown , in other words , Few studies have been done to allow the model to perform better . Roughly speaking , We still don't know what makes a fact easy or difficult to learn and predict . In order to alleviate the above problems , We have a representative group based on KG Embedded LP The models are widely compared and analyzed . We give priority to the most advanced systems , And consider work that belongs to a broad architecture . We train and adjust these systems from scratch , And by proposing new 、 Informative assessment practices , Provide experimental results beyond the original paper . The concrete is :
- We have considered 16 A model , Belong to different machine learning and deep learning architectures ; We also use an additional state-of-the-art based on rule mining LP Model as baseline . We provide a detailed description of the methods considered in the experimental comparison and a summary of the relevant literature , And the educational classification of knowledge map embedding technology .
- We have considered 5 The most commonly used data sets , And the most popular metrics currently used for benchmarking ; We analyzed their characteristics in detail .
- For each model , We provide quantitative results of efficiency and effectiveness for each data set .
- We propose a set of structural features in the training data , And measure how they affect the predictive performance of each model for each test fact .
Methods an overview
In this section , We describe and discuss the main methods of knowledge management based on latent characteristics . As in No 2 As described in section ,LP Models can utilize a variety of methods and architectures , It depends on how they model the optimization problem , And the techniques they implement to deal with optimization problems .
In order to outline their highly different characteristics , We propose a new classification , Pictured 1 Shown . We have listed three main series of models , And further divide them into smaller groups , Marked with unique colors . For each group , We all include the most effective representative models , Give priority to models that achieve the most advanced performance , And whenever possible , Prioritize models that have publicly available implementations . The result is a set of 16 A model , Based on an extremely diverse architecture ; These are the models we used later in the experimental part of comparative analysis . For each model , We also report the year of publication and information from other models . We think , This classification is helpful to understand these models and the experiments carried out in our work . surface 1 Further information about the included models is reported , For example, their loss function and spatial complexity . We have identified three types of models :1) Tensor decomposition model ;2) Geometric models ;3) Deep learning model .


Tensor decomposition model
The model of this family will LP Explain the task as a tensor decomposition . These models implicitly put KG Consider a three-dimensional adjacency matrix ( That is, a 3 D tensor ), because KG The incompleteness of , This adjacency matrix is only partially observable . The tensor is decomposed into a combination of low dimensional vectors ( For example, a multilinear product ): These vectors are used as embedded representations of entities and relationships . The core idea of tensor decomposition is , As long as the training set does not fit , Then the learned embedding should be able to generalize , And the high value is associated with the unobservable real facts in the graph adjacency matrix . In practice , The score of each fact is calculated by combining the specific embeddedness involved in the fact ; By optimizing the scoring function of all training facts , You can learn to embed as usual . These models tend to use little or no shared parameters ; This makes them particularly easy to train .
Geometric models
The geometric model interprets the relationship as a geometric transformation of the potential space . For a given fact , The head entity is embedded for space conversion τ, Use the embedded relationship as the value of the parameter . The value of the fact score is the distance between the result vector and the tail vector ; In this way, the distance function can be used to calculate δ( for example L1 and L2 norm ).

Deep learning model
The deep learning model uses deep neural networks to perform LP Mission . Neural network learning parameters , Such as weight and deviation , They combine input data , To identify significant patterns . Deep neural networks usually organize parameters into independent layers , Usually, nonlinear activation functions are interspersed .
as time goes on , People have developed many different types of layers , Apply different operations to the input data . for example , The full connection layer will put the input data X And weight W Combine , And add a deviation B: W X + B. For the sake of simplicity , We will not mention the use of deviation in the following formula , Keep it implicit . Higher layers perform more complex operations , Such as convolution ( It learns convolution kernel to apply to input data ) Or recursive layer ( Process sequential input recursively ).
stay LP Tasks , It is usually learned by combining the weights and deviations of each layer KG The embedded ; These shared parameters make these models more expressive , But it may lead to more parameters , Harder to train , It's easier to over fit .










边栏推荐
- Question bank and answers for chemical automation control instrument operation certificate examination in 2022
- 数据加载及预处理
- Common thread methods and daemon threads
- OdeInt与GPU
- 2022 polymerization process test questions and simulation test
- Registration for R2 mobile pressure vessel filling test in 2022 and R2 mobile pressure vessel filling free test questions
- Account sharing technology enables the farmers' market and reshapes the efficiency of transaction management services
- 什么是权限?什么是角色?什么是用户?
- Codeforces Round #721 (Div. 2)B1. Palindrome Game (easy version)B2. Palindrome game (hard version)
- 【深度学习】(4) Transformer 中的 Decoder 机制,附Pytorch完整代码
猜你喜欢

Common thread methods and daemon threads

OdeInt與GPU
![[human version] Web3 privacy game in the dark forest](/img/89/e16789b7f3892002748aab309c45e6.png)
[human version] Web3 privacy game in the dark forest

CF1638E. Colorful operations Kodori tree + differential tree array

VR线上展览所具备应用及特色
![[ue4] event distribution mechanism of reflective event distributor and active call event mechanism](/img/44/6a26ad24d56ddd5156f3a31fa7e0b9.jpg)
[ue4] event distribution mechanism of reflective event distributor and active call event mechanism

LM小型可编程控制器软件(基于CoDeSys)笔记二十:plc通过驱动器控制步进电机

Concurrent mode of different performance testing tools

TASK04|數理統計

TCP server communication flow
随机推荐
Maixll dock quick start
Knowledge supplement: basic usage of redis based on docker
2022年煤气考试题库及在线模拟考试
[recommended algorithm] C interview question of a small factory
Ten wastes of software research and development: the other side of research and development efficiency
LM小型可编程控制器软件(基于CoDeSys)笔记二十:plc通过驱动器控制步进电机
离线安装wireshark2.6.10
How to use maixll dock
How to do the performance pressure test of "Health Code"
[pat (basic level) practice] - [simple simulation] 1064 friends
TASK04|數理統計
Selenium opens the Chrome browser and the settings page pops up: Microsoft defender antivirus to reset your settings
OdeInt与GPU
Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation 阅读笔记
What is uid? What is auth? What is a verifier?
JS rotation chart
TASK04|数理统计
Shell analysis server log command collection
Advanced application of ES6 modular and asynchronous programming
Offline installation of Wireshark 2.6.10