当前位置：网站首页>The whole process of knowledge map construction

The whole process of knowledge map construction

2022-07-07 03:00:00 【AI Zeng Xiaojian】

One 、 An introduction to the knowledge map

Knowledge map , yes Structured semantic knowledge base , Used to quickly describe The physical world Medium Concept And its interrelation , Through the knowledge map, we can transform Web Information on 、 Data and Link relationships are aggregated into knowledge , Make information resources easier to calculate 、 Understand and evaluate , And can realize the rapid response and reasoning of knowledge .

1.1 Widely used in various fields

At present, knowledge atlas has been widely used in the industrial field , Such as in the search field Google Search for 、 Baidu search , The leading UK economic map in the social field , In the field of enterprise information Tianyan checks the enterprise atlas , In the field of e-commerce Taobao commodity map ,O2O In the field of Meituan knowledge brain , In the medical field Ding Xiangyuan knowledge map , And the knowledge map of industrial manufacturing .

Knowledge atlas has been widely used in industry ; 6-2

In the early stage of the development of Knowledge Mapping Technology , Many enterprises and scientific research institutions will adopt a top-down approach to build a basic knowledge base , Such as Freebase. With Automatic knowledge extraction With the continuous maturity of processing technology , The current knowledge map mostly adopts Bottom up The way to build , Such as Google Of Knowledge Vault And Microsoft. Satori The knowledge base .

1.2 Build technology classification

The construction technology of knowledge map mainly includes top-down and bottom-up .

Top down build ： With the help of encyclopedia websites Structured data sources , Extract from high-quality data noumenon And mode information , Add to the knowledge base .
Build from the bottom up ： By means of certain technical means , Extract resource patterns from publicly collected data , Select the information with high confidence , Add to the knowledge base .

The construction of knowledge map | The top-down → Bottom up ; 6-3

1.3 “ Entity - Relationship - Entity ” A triple

The following figure is a typical sample diagram of knowledge map . You can see ,“ Manual ” There are a lot of nodes in , If there is a relationship between two nodes , They will Connected by an undirected edge together , This node is called Entity （Entity）, The edge between the nodes , We call it Relationship （Relationship）.

A map of knowledge (Knowledge Graph) example ; 6-4

The basic unit of knowledge map , Namely “ Entity (Entity)- Relationship (Relationship)- Entity (Entity)” The triples formed , This is also the core of the knowledge map .

Two 、 Data types and storage methods

Generally speaking, there are three types of original data of knowledge map （ It's also the three kinds of raw data on the Internet ）：

Structured data （Structed Data）, Such as ： relational database 、 Linked data
Semi-structured data （Semi-Structured Data）, Such as ：XML、JSON、 Encyclopedias
Unstructured data （Unstructured Data）, Such as ： picture 、 Audio 、 video

Knowledge map | 3 Type of data & 2 Storage methods ; 6-5

Typical examples of semi-structured data are as follows ：

Examples of data types for knowledge maps | Semi-structured data ; 6-6

How to store the above three types of data ？

There are generally two options ： Can pass RDF（ Resource Description Framework ） Such a standard storage format for storage , More commonly used are Jena etc. .

<RDF>
    <Description about="https://www.w3.org/RDF/">
        <author>HanXinzi</author>
        <homepage> http://www.showmeai.tech </homepage>
    </Description>
</RDF>

Another way is to use Graph database To store , Commonly used Neo4j etc. .

Examples of how knowledge maps are stored | Graph database ; 6-7

So far , It seems that the knowledge map is mainly a pile A triple , Is it OK to use relational database to store ？

Yes , Technically , Using relational database to store knowledge map （ Especially the knowledge map with simple structure ）, No problem at all . But once the knowledge map becomes complex , In the traditional way 「 Relational data storage 」, The query efficiency will be significantly lower than 「 Graph database 」. In some cases, it involves 2,3 Degree related query scenario , Graph database can improve query efficiency thousands or even millions of times .

And graph based storage will be very flexible in design , Generally, only local changes are needed . When your scene data scale is large , It is recommended to directly use the graph database for storage .

3、 ... and 、 The structure of knowledge map

The structure of knowledge map can be divided into ：

Logical architecture
Technology Architecture

The structure of knowledge map | Logical architecture & Technology Architecture ; 6-8

3.1 Logical architecture

Logically , We usually divide the knowledge map into two levels ： Data layer and pattern layer .

Pattern layer ： Above the data layer , yes The core of knowledge map , Store refined knowledge , This layer is usually managed through ontology library （ Ontology library can be understood as... In object-oriented “ class ” Such a concept , Ontology database stores the classes of knowledge map ）.
The data layer ： Store real data .

Take a look at this example ：
Pattern layer ： Entity - Relationship - Entity , Entity - attribute - Sex value
The data layer ： Wu Jing - Wife - Xie Nan , Wu Jing - The director - Warwolf Ⅱ

3.2 Technology Architecture

The overall structure of the knowledge map is shown in the figure , The part in the dotted box is the construction process of knowledge map , At the same time, it is also the process of updating knowledge map . take it easy , Let's follow this picture to sort out our thoughts .

First , We have a lot of data , The data may be structured 、 Unstructured and semi-structured ;
then , We build a knowledge map based on these data , This step is mainly through a series of automatic or semi-automatic technical means , To extract knowledge elements from the original data , That is, a bunch of entity relationships , And store it in the pattern layer and data layer of our knowledge base .

Four 、 Build technology

I've talked about the previous content , There are top-down and bottom-up construction methods of knowledge map , The construction techniques mentioned here are mainly Bottom up Building technology for .

As mentioned earlier , Building a knowledge map is an iterative process , According to the logic of knowledge acquisition , Each iteration consists of three phases ：

Information extraction ： Extract entities from various types of data sources 、 Properties and relationships between entities , On this basis, ontology knowledge expression is formed .
Knowledge fusion ： After acquiring new knowledge , It needs to be integrated , To eliminate contradictions and ambiguities , For example, some entities may have multiple expressions , A certain appellation may correspond to many different entities, etc .
Knowledge processing ： For new knowledge that has been integrated , After quality assessment （ Some of them need to be screened manually ）, In order to add the qualified part to the knowledge base , To ensure the quality of the knowledge base .

Knowledge Mapping Technology Architecture @ Bottom up ; 6-9

Let's introduce each step in turn .

4.1 knowledge

knowledge （infromation extraction） It's the first... Of the construction of knowledge map 1 Step , The key problem is ： How to automatically extract information from heterogeneous data sources to get candidate instruction units ？

Information extraction is an automatic extraction of entities from semi-structured and unstructured data 、 The technology of structured information such as relationships and entity attributes . The key technologies involved include ： Entity extraction 、 Relationship extraction and Attribute extraction .

knowledge (Information Acquisition); 6-10

1） Entity extraction

Entity extraction , Also known as named entity recognition （named entity recognition,NER）, It refers to the automatic recognition of named entities from the text dataset .

In the figure , Through entity extraction, we can extract four entities ：“ Africa ”、“ The Chinese navy ”、“ Leng Feng ”、“ Warwolf ”.

Entity extraction / Named entity recognition (NER); 6-11

Study history ：
◉ Entity extraction from a single domain , Step by step to open domain （Open Domain） Entity extraction .

2） Relationship extraction

After entity extraction of text corpus , What we get is a series of discrete named entities . In order to get semantic information , We also need to extract the relationship between entities from the relevant corpus , Connect entities through relationships , To form a network of knowledge structure . This is what relationship extraction needs to do , As shown in the figure below .

Relationship extraction / Extract the relationship between entities from the corpus ; 6-12

Study history ：
◉ Artificially construct grammar and semantic rules （ Pattern matching ）.
◉ Statistical machine learning methods .
◉ Supervised learning method based on eigenvector or kernel function .
◉ The focus of research has shifted to semi supervised and unsupervised .
◉ We started to study information extraction methods for open domain .
◉ It combines the information extraction method for open domain with the traditional method for closed domain .

3） Attribute extraction

The goal of attribute extraction is to collect attribute information of specific entities from different information sources , For example, for a public figure , You can get its nickname from the public information on the Internet 、 Birthday 、 nationality 、 Education background, etc .

Attribute extraction / Collect attribute information of specific entities from different information sources ; 6-13

Study history ：
◉ Regarding the attribute of entity as a nominal relationship between entity and attribute value , Transform attribute extraction task into relation extraction task .
◉ Based on rules and Heuristics , Extract structured data .
◉ Based on the semi-structured data of encyclopedia websites , Training corpus is generated by automatic extraction , Used to train entity attribute annotation model , Then it is applied to the entity attribute extraction of unstructured data .
◉ Data mining method is used to mine the relationship pattern between entity attributes and attribute values directly from the text , According to this, we can locate the attribute name and value in the text .

4.2 Knowledge fusion

Through information extraction , We get the entities from the original unstructured and semi-structured data 、 Relationship and attribute information of entity . If we compare the next process to a jigsaw puzzle , So these are pieces of puzzle , It's disorganized, and even fragments from other puzzles 、 It's a piece of error that interferes with our puzzle .

in other words , Puzzle pieces （ Information ） The relationship between them is flat , Lack of hierarchy and logicality ; Puzzle （ knowledge ） There are also a lot of jumbled and wrong pieces of jigsaw （ Information ）. So how to solve this problem , It's what we need to do in the process of knowledge integration .

Knowledge fusion (Knowledge Fusion); 6-14

Knowledge fusion includes 2 Part content ： Entity link 、 Knowledge merge .

1） Entity link

Entity link （entity linking） It refers to the entity object extracted from the text , The operation of linking it to the corresponding correct entity object in the knowledge base . Its basic idea is first according to the given entity reference , Select a set of candidate entity objects from the knowledge base , Then the reference item is linked to the correct entity object by similarity calculation .

Entity link / Semi-structured data & Unstructured data ; 6-15

Study history ：
◉ Only focus on how to link the entities extracted from the text to the knowledge base , Ignore the semantic relationship between entities in the same document ;
◉ Start to pay attention to the co-occurrence relationship of using entities , Link multiple entities to the knowledge base at the same time . That is, integration entity link （collective entity linking）.

The process of entity linking ：

Entity references are extracted from the text .
Conduct Entity disambiguation and Co refers to digestion , Determine whether the entity with the same name in the knowledge base represents different meanings and whether there are other named entities with the same meaning in the knowledge base .
After confirming the corresponding correct entity object in the knowledge base , Connect the entity reference necklace to the corresponding entity in the knowledge base .

◉ Entity disambiguation ： It is a technology specially used to solve the ambiguity problem of entities with the same name , By entity disambiguation , According to the current context , Establish entity links accurately , The main method of entity disambiguation is clustering . In fact, it can also be seen as the problem of context based classification , Similar to part of speech disambiguation and word sense disambiguation .
◉ Co refers to digestion ： It is mainly used to solve the problem that multiple references correspond to the same entity object . In a conversation , Multiple references may refer to the same entity object . Using the common finger digestion technology , You can relate these references to （ Merge ） To the correct entity object , Because this problem has special importance in the fields of information retrieval and natural language processing , Attracted a lot of research efforts . There are other names for coreference resolution , Such as object alignment 、 Entity matching is synonymous with entity .

2） Knowledge fusion

In the previous entity link , We have linked the entity to the corresponding correct entity object in the knowledge base , But it should be noted that , Entity link refers to the data extracted by information extraction from semi-structured data and unstructured data .

Well, in addition to semi-structured data and unstructured data , We also have a more convenient data source ——— Structured data , Such as external knowledge base and relational database . For the processing of this part of structured data , This is the content of our knowledge fusion .

Generally speaking, knowledge fusion can be divided into two types ： Merge external knowledge base , It mainly deals with the conflict between data layer and pattern layer ; Merge relational databases , Yes RDB2RDF Other methods .

Structured data ; 6-16

4.3 Knowledge processing

After a series of steps just now , We have finally reached the stage of knowledge processing ！ in front , We have extracted information , The entity is extracted from the original corpus 、 Knowledge elements such as relationship and attribute , And through knowledge fusion , Eliminate the ambiguity between the entity referent and the entity object , Get a basic set of facts to express .

But fact itself is not knowledge . To finally get structured , Network knowledge system , It also needs to go through the process of knowledge processing . Knowledge processing mainly includes 3 Aspect content ： Ontology extraction 、 Knowledge reasoning and quality assessment .

Knowledge processing (Knowledge Processing); 6-17

1） Ontology extraction

noumenon （ontology） It refers to the concept set of workers 、 Conceptual framework , Such as “ people ”、“ things ”、“ matter ” etc. . Ontologies can be built manually by means of manual editing （ With the help of ontology editing software ）, You can also build ontologies in a data-driven, automated way . Because of the huge workload of manual mode , And it's hard to find qualified experts , So the current mainstream global ontology library products , They all start from some existing ontology libraries that are oriented to specific fields , It is gradually expanded by using automatic construction technology .

The automated ontology building process consists of three phases ： Similarity calculation of entity juxtaposition relationship → Entity relation extraction → The generation of ontology .

Ontology extraction / Automate the ontology construction process ; 6-18

As shown in the figure , When the map of knowledge has just been obtained “ Warwolf Ⅱ”、“ Wandering the earth ”、“ Beijing cultural ” When these three entities , It may be thought that there is no difference among the three . But when it calculates the similarity between three entities , You will find ,“ Warwolf Ⅱ” and “ Wandering the earth ” May be more similar , And “ Beijing cultural ” The difference is bigger .

The first step is to come down , In fact, there is no concept of upper and lower levels in knowledge map . It still doesn't know ,“ Wandering the earth ” and “ Beijing cultural ” Not belonging to a type , Can't compare .
So the second step 『 Entity relation extraction 』 Need to do such a job , So as to generate the ontology of the third step .
When the three steps are over , This map of knowledge may understand ,“ Warwolf 2 And wandering the earth , It is a subdivision entity under the entity of film . They are not the same as Beijing culture ”.

2） Knowledge reasoning

After we have completed the ontology building step , The rudiment of a knowledge map has been built . But maybe at this time , Most of the relationships between knowledge maps are incomplete , The missing value is very serious , So at this point , We can use knowledge reasoning technology , To complete further knowledge discovery .

Knowledge reasoning / Further improve the knowledge map ; 6-19

Of course, the object of knowledge reasoning is not limited to the relationship between entities , It can also be the attribute value of an entity , The concept level relation of ontology .

Infer attribute values ： The birthday attribute of an entity is known , The age attribute of the entity can be obtained by reasoning ;
The concept of reasoning ： It is known that ( The tiger , Families, , Felidae ) and （ Felidae , Objective , Carnivores ） Can be launched （ The tiger , Objective , Carnivores ）

The algorithm of this block can be divided into 3 Categories: ： Relational reasoning technology based on knowledge expression ; Schematic diagram of relationship reasoning technology based on probability graph model ; Schematic diagram of relationship reasoning technology based on deep learning .

Knowledge reasoning / 3 Class main algorithm ; 6-20

3） Quality assessment

Quality assessment is also an important part of knowledge base construction technology , The significance of this part lies in ： The credibility of knowledge can be quantified , The quality of knowledge base is guaranteed by discarding knowledge with low confidence .

4.4 Knowledge update

Logically speaking , The update of knowledge base includes the update of concept layer and data layer .

Update of concept layer ： After adding new data, a new concept is obtained , New concepts need to be automatically added to the concept layer of the knowledge base .
Data layer update ： It is mainly about adding or updating entities 、 Relationship 、 Property value , To update the data layer, we need to consider the reliability of the data source 、 Data consistency （ Whether there are contradictions or miscellaneous problems ） Wait for reliable data sources , And choose the facts and attributes that appear frequently in each data source to join the knowledge base .

Knowledge map construction & Update process ; 6-21

There are two ways to update the content of knowledge map ：

Comprehensive update ： It refers to the input of all the updated data , Build a knowledge map from scratch . This method is relatively simple , But resource consumption is high , And it takes a lot of human resources to maintain the system ;
Incremental updating ： Take the newly added data as input , Add new knowledge to the existing knowledge map . This way, the consumption of resources is small , But a lot of human intervention is still needed （ Define rules, etc ）, So it's very difficult to implement .

The construction of knowledge map is over ！

5、 ... and 、 Relevant code implementation reference

obtain 『 natural language processing 』 Industry solutions

official account ShowMeAI research center Reply key 『 natural language processing 』, obtain ShowMeAI Organized Big factory solutions —— Including Tencent 、 Iqiyi 、 Meituan 、 millet 、 Baidu 、 TaoBao 、 Gaode and other project codes 、 Data sets 、 Paper collection and other packaged materials .

Relevant code implementation reference

ShowMeAI The technical experts and partners in the community have also implemented the typical algorithm of knowledge map . Yes 『 Knowledge map construction and practice 』 If you are interested in details , Please go to our GitHub project https://github.com/ShowMeAI-Hub View the implementation code . thank AI Institute of algorithms All technical experts and partners involved in this project , Recommend official account . The collation of data sets and code takes a lot of effort , Welcome to PR and Star！

6、 ... and 、 reference

1 Liu Qiao , Li Yang , Duan Hong , etc. . Overview of knowledge map construction technology J. Computer research and development , 2016, 53(3):582-600.
2 Strange ant . CSDN. Knowledge map technology skills .
3 Ehrlinger L, Wöß W. Towards a Definition of Knowledge GraphsC// Joint Proceedings of the Posters and Demos Track of,
International Conference on Semantic Systems - Semantics2016 and,
International Workshop on Semantic Change & Evolving Semantics. 2016.
4 Das R, Neelakantan A, Belanger D, et al. Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural NetworksJ.

原网站

版权声明
本文为[AI Zeng Xiaojian]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/188/202207061926413449.html