当前位置:网站首页>Mathematical modeling -- knowledge map

Mathematical modeling -- knowledge map

2022-07-08 00:58:00 Light chasing rain

notes : Welcome to your attention datawhale:https://datawhale.club/

series :

Introduction to knowledge map 1 : Knowledge map Introduction
Introduction to knowledge map 2-1: practice —— Question answering system based on medical knowledge map
Introduction to knowledge map 2-2: User input -> Query statement of knowledge base
Introduction to knowledge map 2-3:Neo4j Graph database query
One 、 An introduction to the knowledge map
1.1 introduction
From the beginning Google Search for , To now chat robot 、 Big data risk control 、 portfolio investment 、 Intelligent medical treatment 、 Adaptive education 、 Recommendation system , All of them are related to the knowledge map . Its popularity in the field of technology is also increasing year by year .

As early as 2010 Microsoft has been building a knowledge map since , Include Satori and Probase;2012 year ,Google Officially released Google Knowledge Graph, Now the scale has exceeded 700 Billion . At present, Microsoft and Google The world's largest general knowledge map ,Facebook With the largest social knowledge map in the world , Alibaba and Amazon have constructed the knowledge map of commodities respectively .

chart 1 Industry layout

chart 2 Industry application

This chapter explains the knowledge related to the knowledge map in an easy to understand way 、 Introduce the steps and each stage in the process of building a knowledge map from scratch . This team study will also practice a on kg Application in Intelligent Question and answer .

1.2 What is the knowledge map ?
The map of knowledge is made up of Google The company in 2012 A new concept put forward in . From an academic point of view , We can give such a definition to the knowledge map :“ Knowledge map is essentially a semantic network (Semantic Network) The knowledge base of ”. But it's a bit abstract , So change the angle , From the perspective of practical application, knowledge mapping can be simply understood as multi relational graph (Multi-relational Graph).

1.2.1 What is a graph? (Graph) Well ?
chart (Graph) By node (Vertex) He Bian (Edge) To form , Multi graph generally contains multiple types of nodes and edges . Entity ( node ) It refers to things in the real world, such as people 、 Place names 、 Concept 、 medicine 、 Company, etc , Relationship ( edge ) Is used to express some kind of connection between different entities , For example, people. -“ Live in ”- Beijing 、 Zhang San and Li Si are friends “ friend ”、 Logical regression is the foundation of deep learning “ Leading knowledge ” wait .

chart 3 chart (Graph) Introduce

1.2.2 What is? Schema Well ?
Another important concept of knowledge map is Schema:
Introduce : Define the format of knowledge map data to be added ; It is equivalent to a data model in a certain field , Contains the meaningful concept types in the field and the properties of these types
effect : Standardize the expression of structured data , A piece of data must meet Schema Predefined entity objects and their types , To be allowed to update into the knowledge map , A picture is worth a thousand words
In the picture DataType It limits the type of knowledge map node value to text 、 date 、 Numbers ( Floating point and integer )
In the picture Thing Defines the type of node and its attributes ( This is the picture 1-1 In the middle )
Illustrate with examples : Based on the following figure Schema Only works can be included in the knowledge map 、 Local organizations 、 figure ; Among them, the attributes of works are film and music 、 The attribute of local organizations is local business (eg: The hotel 、 Club, etc )、 The character's attribute is singer
tips: This team study does not involve schema The construction of

chart 4 Schema Definition

1.3 What is the value of the knowledge map ?
From the picture 5 It can be seen that , Knowledge mapping is an important branch of artificial intelligence , The goal of artificial intelligence is to make machines have the ability to think and do things rationally like human beings ->
Under the guidance of symbolism , knowledge engineering ( The core content is to build expert system ) A breakthrough has been made ->
Under the whole branch of Knowledge Engineering , Knowledge representation is a very important task ->
Knowledge mapping is an important part of knowledge representation

chart 5 Subject concept

Two 、 How to build a knowledge map ?
2.1 Where does the data of knowledge map come from ?
The construction of knowledge map is the basis of subsequent application , And the premise of construction is to extract data from different data sources . For the vertical domain knowledge map , Their data sources mainly come from two channels :

The first one is : The data of the business itself . This part of data is usually contained in the company's database table and stored in a structured way , Generally, it only needs simple pretreatment, that is, it can be used as a follow-up AI System input ;
The second kind : Open on the Internet 、 Captured data . These data are usually in the form of web pages, so they are unstructured data , Generally, we need to use natural language processing technology to extract structured information .

chart 6 Data sources

For example, in the following search example ,Bill Gates and Malinda Gate Relationships can be extracted from unstructured data , Data sources such as Wikipedia .

chart 7 Illustrate with examples

2.2 Where is the difficulty of information extraction ?
The difficulty of information extraction is to deal with unstructured data . In the picture below , We give an example . On the left is an unstructured English text , On the right are the entities and relationships extracted from these texts .

chart 8 Examples of difficulties in information extraction

2.3 The technologies involved in building knowledge mapping ?
In the process of building a similar map , It mainly involves the following aspects of natural language processing technology :

Entity name recognition (Name Entity Recognition)
Relationship extraction (Relation Extraction)
Unity of substance (Entity Resolution)
Anaphora digestion (Coreference Resolution)

2.4、 What is the specific construction technology of knowledge map ?
The following is a brief description of the problems solved by each technology , As for how these are realized , Not here one by one , The follow-up courses and the second phase of the knowledge map will slowly unfold :

chart 9 Specific construction technology Example

2.4.1 Entity name recognition (Named Entity Recognition)
Entity name recognition ( English :Named Entity Recognition), abbreviation NER
The goal is : It is to extract entities from the text and classify each entity / tagging ;
Illustrate with examples : For example, from the above text , We can extract entities -“NYC”, And mark the entity type as “Location”; We can also extract “Virgil’s BBQ”, And mark the entity type as “Restarant”.
This process is called entity naming recognition , This is a relatively mature technology , There are some ready-made tools that can be used to do this .
2.4.2 Relationship extraction (Relation Extraction)
Relationship extraction ( English :Relation Extraction), abbreviation RE
Introduce : Through relation extraction technology , Extract the relationship between entities from the text ;
Illustrate with examples : Like entities “hotel” and “Hilton property” The relationship between is “in”;“hotel” and “Time Square” The relationship is “near” wait .

chart 9 NER and RE Example

2.4.3 Unity of substance (Entity Resolution)
Unity of substance ( English :Entity Resolution), abbreviation ER
Introduce : For some entities, the writing method is different , But it actually points to the same entity ;
Illustrate with examples : such as “NYC” and “New York” On the surface, there are different strings , But it actually refers to the city of New York , Need merger .
value : Entity unification can not only reduce the types of entities , It can also reduce the sparsity of the map (Sparsity);
2.4.4 Anaphora digestion (Disambiguation)
Anaphora digestion ( English :Disambiguation)
Introduce : In the text “it”, “he”, “she” Which entity do these words point to , For example, in this article, two are marked “it” All point to “hotel” This entity .

chart 10 ER and Disambiguation Example

3、 ... and 、 The storage of knowledge map
There are two main ways to store knowledge map :
One is based on RDF The storage ;
The other is the storage based on graph database .
The difference between them is shown in the figure below .RDF An important design principle is the easy release and sharing of data , Graph database focuses on efficient graph query and search . secondly ,RDF Data is stored in triples and does not contain attribute information , But graph database is usually represented by attribute graph , So entities and relationships can contain properties , This means it's easier to express real business scenarios . among Neo4j At present, the system is still the most widely used graph database , It has an active community , And the query efficiency of the system itself is high , But the only drawback is that it doesn't support quasi distributed . contrary ,OrientDB and JanusGraph( primary Titan) Support distributed , But these systems are relatively new , The community is not as good as Neo4j active , This also means that there will inevitably be some problems in the use process . If you choose to use RDF Storage system ,Jena Maybe a better choice .

chart 11 RDF The storage and Storage based on graph database The difference between

Four 、Neo4J Introduction and installation
4.1 introduction
“ A good workman does his work well , You must sharpen your tools first ”, Knowledge map is a special graph structure , Naturally, a special graph database is needed for storage .

Because its data contains entities 、 attribute 、 Relationships, etc , Common relational databases such as MySQL And so on can not well reflect these characteristics of the data , Therefore, the storage of knowledge map data generally adopts graph database (Graph Databases). and Neo4j Is one of the most common graph databases .

4.2 Neo4J download
First, in the Neo4J Official website download Neo4J.

Neo4J Divided into community version and enterprise version :
Enterprise Edition : charge , Expand horizontally 、 Access control 、 Operational performance 、HA And other aspects are better than the community version , Suitable for formal production environment ;
Community Edition : free , Just use the free community version for ordinary learning and development .
4.3 Neo4J install
stay Mac perhaps Linux in , Install well jdk after , Just unzip it and download it Neo4J package , Run the command
bin/neo4j start
1
windows Download the system neo4j and jdk 1.8.0 after , Enter the following command to start neo4j
neo4j.bat console
1

chart 12 Neo4j Running results

4.4 Neo4J Web Interface Introduce
Neo4J Provides a user-friendly Web Interface , Various configurations can be carried out 、 write in 、 Query and so on , It also provides visualization function . similar ElasticSearch equally , I personally like this out of the box design very much .

Open the browser , Input http://127.0.0.1:7474/browser/, Here's the picture 13 Shown , The top of the interface is the input box .

chart 13 Neo4J Web Interface

4.5 Cypher query language
Cypher:
Introduce : yes Neo4J The declarative Graphic Query Language , Allows users to write traversal code without having to write graphical structures , You can query the graphic data efficiently .
Design purpose : similar SQL, It's suitable for developers and doing point-to-point mode on database (ad-hoc) Professional operators of inquiry .
Its capabilities include :
establish 、 to update 、 Delete nodes and relationships
Query and modify nodes and relationships through pattern matching - Manage indexes and constraints, etc
5、 ... and 、Neo4J actual combat
5.1 introduction
The nodes of this case mainly include characters and cities , There are friends between characters 、 Relationship between husband and wife , There is a place of birth relationship between the character and the city . Special thanks, Zhihu @ Strange dust hands teach you a quick introduction to the knowledge map - Neo4J course

Person-Friends-PERSON
Person-Married-PERSON
Person-Born_in-Location
5.2 Create nodes
Delete the previous graph in the database , Ensure a blank environment to operate 【 notes : Use with caution , If there is important information in the library 】:

chart 14 Neo4J Delete Library

MATCH (n) DETACH DELETE n
1
here ,MATCH It's a matching operation , And parentheses () Represents a node node( Parentheses are like a circle ), In parentheses n For identifier .

Create a character node :
CREATE (n:Person {name:‘John’}) RETURN n
1
notes :

CREATE Is the creation operation ,Person Is the label , Represents the type of node .

Curly braces {} Represents the attributes of the node , Properties are similar to Python Dictionary .

The meaning of this statement is to create a label as Person The node of , This node has a name attribute , The property value is John.

Create more character nodes , And named them separately :
CREATE (n:Person {name:‘Sally’}) RETURN n
CREATE (n:Person {name:‘Steve’}) RETURN n
CREATE (n:Person {name:‘Mike’}) RETURN n
CREATE (n:Person {name:‘Liz’}) RETURN n
CREATE (n:Person {name:‘Shawn’}) RETURN n
1
2
3
4
5
Pictured 15 Shown ,6 Person nodes created successfully

chart 15 establish Character node

Create regional node
CREATE (n:Location {city:‘Miami’, state:‘FL’})
CREATE (n:Location {city:‘Boston’, state:‘MA’})
CREATE (n:Location {city:‘Lynn’, state:‘MA’})
CREATE (n:Location {city:‘Portland’, state:‘ME’})
CREATE (n:Location {city:‘San Francisco’, state:‘CA’})
1
2
3
4
5
You can see , The node type is Location, Properties include city and state.

Pictured 16 Shown , share 6 A character node 、5 A regional node ,Neo4J Kindly use different colors to represent different types of nodes .

chart 16 Create regional node

5.3 Create relationships
Friendship
MATCH (a:Person {name:‘Liz’}),
(b:Person {name:‘Mike’})
MERGE (a)-[:FRIENDS]->(b)
1
2
3
notes :

square brackets [] It's relationship ,FRIENDS For the type of relationship .

Notice the arrow here –> There is a direction , From a To b The relationship between . such ,Liz and Mike Between the establishment of FRIENDS Relationship .

Relationship adds attributes
MATCH (a:Person {name:‘Shawn’}),
(b:Person {name:‘Sally’})
MERGE (a)-[:FRIENDS {since:2001}]->(b)
1
2
3
Add more friends :
MATCH (a:Person {name:‘Shawn’}), (b:Person {name:‘John’}) MERGE (a)-[:FRIENDS {since:2012}]->(b)
MATCH (a:Person {name:‘Mike’}), (b:Person {name:‘Shawn’}) MERGE (a)-[:FRIENDS {since:2006}]->(b)
MATCH (a:Person {name:‘Sally’}), (b:Person {name:‘Steve’}) MERGE (a)-[:FRIENDS {since:2006}]->(b)
MATCH (a:Person {name:‘Liz’}), (b:Person {name:‘John’}) MERGE (a)-[:MARRIED {since:1998}]->(b)
1
2
3
4
such , The map has been established :

chart 17 Manual

5.4 establish Place of birth
Establishing relationships between different types of nodes - The relationship between people and places
MATCH (a:Person {name:‘John’}), (b:Location {city:‘Boston’}) MERGE (a)-[:BORN_IN {year:1978}]->(b)
MATCH (a:Person {name:‘Liz’}), (b:Location {city:‘Boston’}) MERGE (a)-[:BORN_IN {year:1981}]->(b)
MATCH (a:Person {name:‘Mike’}), (b:Location {city:‘San Francisco’}) MERGE (a)-[:BORN_IN {year:1960}]->(b)
MATCH (a:Person {name:‘Shawn’}), (b:Location {city:‘Miami’}) MERGE (a)-[:BORN_IN {year:1960}]->(b)
MATCH (a:Person {name:‘Steve’}), (b:Location {city:‘Lynn’}) MERGE (a)-[:BORN_IN {year:1970}]->(b)
1
2
3
4
5
The relationship here is BORN_IN, Indicates the place of birth , There is also an attribute , It means the year of birth .

Pictured 18 , Between the character node and the region node , The relationship between the place of birth of the character has been established .

Build relationships when you create nodes
CREATE (a:Person {name:‘Todd’})-[r:FRIENDS]->(b:Person {name:‘Carlos’})
1
The final map is shown in the figure below :

chart 18 Manual

5.5 Graph database query
Check all the information in Boston Born people
MATCH (a:Person)-[:BORN_IN]->(b:Location {city:‘Boston’}) RETURN a,b
1
The result is shown in Fig. 19:

chart 19 Check all the information in Boston Born people

Query all nodes with external relations
MATCH (a)–() RETURN a
1
The result is shown in Fig. 20:

chart 20 Query all nodes with external relations

Query all related nodes
MATCH (a)-[r]->() RETURN a.name, type
1
The result is shown in Fig. 21:

chart 21 Query all related nodes

Query all nodes with external relations , And the type of relationship
MATCH (a)-[r]->() RETURN a.name, type
1
The result is shown in Fig. 22:

chart 22 Query all nodes with external relations , And the type of relationship

Query all nodes with marriage relationship
MATCH (n)-[:MARRIED]-() RETURN n
1
The result is shown in Fig. 23:

chart 23 Query all nodes with marriage relationship

Find someone's friend's friend
MATCH (a:Person {name:‘Mike’})-[r1:FRIENDS]-()-[r2:FRIENDS]-(friend_of_a_friend) RETURN friend_of_a_friend.name AS fofName
1
return Mike Friends of friends of , The result is shown in Fig. 24:

chart 24 Find someone's friend's friend

5.6 Delete and modify
increase / Modify the properties of the node
MATCH (a:Person {name:‘Liz’}) SET a.age=34
MATCH (a:Person {name:‘Shawn’}) SET a.age=32
MATCH (a:Person {name:‘John’}) SET a.age=44
MATCH (a:Person {name:‘Mike’}) SET a.age=25
1
2
3
4
here ,SET Indicates the modification operation

Delete the attribute of the node
MATCH (a:Person {name:‘Mike’}) SET a.test=‘test’
MATCH (a:Person {name:‘Mike’}) REMOVE a.test
1
2
Deleting attributes is mainly done through REMOVE
3. Delete node

MATCH (a:Location {city:‘Portland’}) DELETE a
1
Deleting a node is DELETE
4. Delete related nodes

MATCH (a:Person {name:‘Todd’})-[rel]-(b:Person) DELETE a,b,rel
1

6、 ... and 、 adopt Python operation Neo4j
6.1 neo4j modular : perform CQL ( cypher ) sentence

step 1: Import Neo4j Drive pack

from neo4j import GraphDatabase

step 2: Connect Neo4j Graph database

driver = GraphDatabase.driver(“bolt://localhost:7687”, auth=(“neo4j”, “xxxxxx”))

add to Relationship function

def add_friend(tx, name, friend_name):
tx.run("MERGE (a:Person {name: $name}) "
“MERGE (a)-[:KNOWS]->(friend:Person {name: $friend_name})”,
name=name, friend_name=friend_name)

Definition Relational functions

def print_friends(tx, name):
for record in tx.run("MATCH (a:Person)-[:KNOWS]->(friend) WHERE a.name = $name "
“RETURN friend.name ORDER BY friend.name”, name=name):
print(record[“friend.name”])

step 3: function

with driver.session() as session:
session.write_transaction(add_friend, “Arthur”, “Guinevere”)
session.write_transaction(add_friend, “Arthur”, “Lancelot”)
session.write_transaction(add_friend, “Arthur”, “Merlin”)
session.read_transaction(print_friends, “Arthur”)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

The core part of the above procedure , Abstract is :

neo4j.GraphDatabase.driver(xxxx).session().write_transaction( function ( contain tx.run(CQL sentence )))
1
perhaps

neo4j.GraphDatabase.driver(xxxx).session().begin_transaction.run(CQL sentence )
1
6.2 py2neo modular : By manipulating the python Variable , Reach the operation neo4j Purpose

step 1: Guide pack

from py2neo import Graph, Node, Relationship

step 2: Build a diagram

g = Graph()

step 3: Create nodes

tx = g.begin()
a = Node(“Person”, name=“Alice”)
tx.create(a)
b = Node(“Person”, name=“Bob”)

step 4: Create an edge

ab = Relationship(a, “KNOWS”, b)

step 5: function

tx.create(ab)
tx.commit()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
py2neo The module conforms to python The habit of , It says it feels smooth , Actually, it can't be CQL Can also write

7、 ... and 、 adopt csv File batch import graph data
The previous lesson is to create a single node , Not suitable for mass import . Here we introduce the use of neo4j-admin import Command import , Suitable for deployment in docker In the environment neo4j.
Other import methods can also refer to Neo4j Import data

csv Divided into two nodes.csv and relations.csv, Note that the starting node in the relationship must be in nodes.csv What can be found in the library :

nodes.csv Need to specify unique ID and nam,

headers = [
‘unique_id:ID’, # Unique identification of node storage in graph database
‘name’, # The name of the node display
‘node_type:LABEL’, # The type of node , such as Person and Location
‘property’ # Other properties of the node
]
1
2
3
4
5
6
7

relations.csv

headers = [
‘unique_id’, # The unique identifier of the relational store in the graph database
‘begin_node_id:START_ID’, # begin_node and end_node The value of comes from nodes.csv Nodes in
‘end_node_id:END_ID’,
‘begin_node_name’,
‘end_node_name’,
‘begin_node_type’,
‘end_node_type’,
‘relation_type:TYPE’, # The type of relationship , such as Friends and Married
‘property’ # Other attributes of the relationship
]
1
2
3
4
5
6
7
8
9
10
11
12
Make two csv after , Import through the following steps neo4j:

Two documents nodes.csv ,relas.csv Put it in
neo4j Install absolute path /import
1
Import to graph database mygraph.db
neo4j bin/neo4j-admin import --nodes=/var/lib/neo4j/import/nodes.csv --relationships=/var/lib/neo4j/import/relas.csv --delimiter=^ --database=xinfang*.db
1
delimiter=^ refer to csv The delimiter

Appoint neo4j Which database to use
modify /root/neo4j/conf/neo4j.conf In the document dbms.default_database=mygraph.db
1
restart neo4j You can see that the data has been imported successfully
Reference material
dried food | The technology and application of learning knowledge map from zero to one
Hand in hand to teach you a quick start knowledge map - Neo4J course
python Operation diagram database neo4j Two ways
Neo4j Import data
schema Introduce
Knowledge map Schema
Meituan brain : The modeling method of knowledge map and its application
Xiao Yanghua . Knowledge map : Concept and technology . Beijing : Electronic industry press , 2020.2-39.
————————————————
Copyright notice : This paper is about CSDN Blogger 「 Yueqian Haobo 」 The original article of , follow CC 4.0 BY-SA Copyright agreement , For reprint, please attach the original source link and this statement .
Link to the original text :https://blog.csdn.net/weixin_44023658/article/details/112503294

原网站

版权声明
本文为[Light chasing rain]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202130551477464.html