当前位置:网站首页>Mathematical modeling -- knowledge map
Mathematical modeling -- knowledge map
2022-07-08 00:58:00 【Light chasing rain】
notes : Welcome to your attention datawhale:https://datawhale.club/
series :
Introduction to knowledge map 1 : Knowledge map Introduction
Introduction to knowledge map 2-1: practice —— Question answering system based on medical knowledge map
Introduction to knowledge map 2-2: User input -> Query statement of knowledge base
Introduction to knowledge map 2-3:Neo4j Graph database query
One 、 An introduction to the knowledge map
1.1 introduction
From the beginning Google Search for , To now chat robot 、 Big data risk control 、 portfolio investment 、 Intelligent medical treatment 、 Adaptive education 、 Recommendation system , All of them are related to the knowledge map . Its popularity in the field of technology is also increasing year by year .
As early as 2010 Microsoft has been building a knowledge map since , Include Satori and Probase;2012 year ,Google Officially released Google Knowledge Graph, Now the scale has exceeded 700 Billion . At present, Microsoft and Google The world's largest general knowledge map ,Facebook With the largest social knowledge map in the world , Alibaba and Amazon have constructed the knowledge map of commodities respectively .
chart 1 Industry layout
chart 2 Industry application
This chapter explains the knowledge related to the knowledge map in an easy to understand way 、 Introduce the steps and each stage in the process of building a knowledge map from scratch . This team study will also practice a on kg Application in Intelligent Question and answer .
1.2 What is the knowledge map ?
The map of knowledge is made up of Google The company in 2012 A new concept put forward in . From an academic point of view , We can give such a definition to the knowledge map :“ Knowledge map is essentially a semantic network (Semantic Network) The knowledge base of ”. But it's a bit abstract , So change the angle , From the perspective of practical application, knowledge mapping can be simply understood as multi relational graph (Multi-relational Graph).
1.2.1 What is a graph? (Graph) Well ?
chart (Graph) By node (Vertex) He Bian (Edge) To form , Multi graph generally contains multiple types of nodes and edges . Entity ( node ) It refers to things in the real world, such as people 、 Place names 、 Concept 、 medicine 、 Company, etc , Relationship ( edge ) Is used to express some kind of connection between different entities , For example, people. -“ Live in ”- Beijing 、 Zhang San and Li Si are friends “ friend ”、 Logical regression is the foundation of deep learning “ Leading knowledge ” wait .
chart 3 chart (Graph) Introduce
1.2.2 What is? Schema Well ?
Another important concept of knowledge map is Schema:
Introduce : Define the format of knowledge map data to be added ; It is equivalent to a data model in a certain field , Contains the meaningful concept types in the field and the properties of these types
effect : Standardize the expression of structured data , A piece of data must meet Schema Predefined entity objects and their types , To be allowed to update into the knowledge map , A picture is worth a thousand words
In the picture DataType It limits the type of knowledge map node value to text 、 date 、 Numbers ( Floating point and integer )
In the picture Thing Defines the type of node and its attributes ( This is the picture 1-1 In the middle )
Illustrate with examples : Based on the following figure Schema Only works can be included in the knowledge map 、 Local organizations 、 figure ; Among them, the attributes of works are film and music 、 The attribute of local organizations is local business (eg: The hotel 、 Club, etc )、 The character's attribute is singer
tips: This team study does not involve schema The construction of
chart 4 Schema Definition
1.3 What is the value of the knowledge map ?
From the picture 5 It can be seen that , Knowledge mapping is an important branch of artificial intelligence , The goal of artificial intelligence is to make machines have the ability to think and do things rationally like human beings ->
Under the guidance of symbolism , knowledge engineering ( The core content is to build expert system ) A breakthrough has been made ->
Under the whole branch of Knowledge Engineering , Knowledge representation is a very important task ->
Knowledge mapping is an important part of knowledge representation
chart 5 Subject concept
Two 、 How to build a knowledge map ?
2.1 Where does the data of knowledge map come from ?
The construction of knowledge map is the basis of subsequent application , And the premise of construction is to extract data from different data sources . For the vertical domain knowledge map , Their data sources mainly come from two channels :
The first one is : The data of the business itself . This part of data is usually contained in the company's database table and stored in a structured way , Generally, it only needs simple pretreatment, that is, it can be used as a follow-up AI System input ;
The second kind : Open on the Internet 、 Captured data . These data are usually in the form of web pages, so they are unstructured data , Generally, we need to use natural language processing technology to extract structured information .
chart 6 Data sources
For example, in the following search example ,Bill Gates and Malinda Gate Relationships can be extracted from unstructured data , Data sources such as Wikipedia .
chart 7 Illustrate with examples
2.2 Where is the difficulty of information extraction ?
The difficulty of information extraction is to deal with unstructured data . In the picture below , We give an example . On the left is an unstructured English text , On the right are the entities and relationships extracted from these texts .
chart 8 Examples of difficulties in information extraction
2.3 The technologies involved in building knowledge mapping ?
In the process of building a similar map , It mainly involves the following aspects of natural language processing technology :
Entity name recognition (Name Entity Recognition)
Relationship extraction (Relation Extraction)
Unity of substance (Entity Resolution)
Anaphora digestion (Coreference Resolution)
…
2.4、 What is the specific construction technology of knowledge map ?
The following is a brief description of the problems solved by each technology , As for how these are realized , Not here one by one , The follow-up courses and the second phase of the knowledge map will slowly unfold :
chart 9 Specific construction technology Example
2.4.1 Entity name recognition (Named Entity Recognition)
Entity name recognition ( English :Named Entity Recognition), abbreviation NER
The goal is : It is to extract entities from the text and classify each entity / tagging ;
Illustrate with examples : For example, from the above text , We can extract entities -“NYC”, And mark the entity type as “Location”; We can also extract “Virgil’s BBQ”, And mark the entity type as “Restarant”.
This process is called entity naming recognition , This is a relatively mature technology , There are some ready-made tools that can be used to do this .
2.4.2 Relationship extraction (Relation Extraction)
Relationship extraction ( English :Relation Extraction), abbreviation RE
Introduce : Through relation extraction technology , Extract the relationship between entities from the text ;
Illustrate with examples : Like entities “hotel” and “Hilton property” The relationship between is “in”;“hotel” and “Time Square” The relationship is “near” wait .
chart 9 NER and RE Example
2.4.3 Unity of substance (Entity Resolution)
Unity of substance ( English :Entity Resolution), abbreviation ER
Introduce : For some entities, the writing method is different , But it actually points to the same entity ;
Illustrate with examples : such as “NYC” and “New York” On the surface, there are different strings , But it actually refers to the city of New York , Need merger .
value : Entity unification can not only reduce the types of entities , It can also reduce the sparsity of the map (Sparsity);
2.4.4 Anaphora digestion (Disambiguation)
Anaphora digestion ( English :Disambiguation)
Introduce : In the text “it”, “he”, “she” Which entity do these words point to , For example, in this article, two are marked “it” All point to “hotel” This entity .
chart 10 ER and Disambiguation Example
3、 ... and 、 The storage of knowledge map
There are two main ways to store knowledge map :
One is based on RDF The storage ;
The other is the storage based on graph database .
The difference between them is shown in the figure below .RDF An important design principle is the easy release and sharing of data , Graph database focuses on efficient graph query and search . secondly ,RDF Data is stored in triples and does not contain attribute information , But graph database is usually represented by attribute graph , So entities and relationships can contain properties , This means it's easier to express real business scenarios . among Neo4j At present, the system is still the most widely used graph database , It has an active community , And the query efficiency of the system itself is high , But the only drawback is that it doesn't support quasi distributed . contrary ,OrientDB and JanusGraph( primary Titan) Support distributed , But these systems are relatively new , The community is not as good as Neo4j active , This also means that there will inevitably be some problems in the use process . If you choose to use RDF Storage system ,Jena Maybe a better choice .
chart 11 RDF The storage and Storage based on graph database The difference between
Four 、Neo4J Introduction and installation
4.1 introduction
“ A good workman does his work well , You must sharpen your tools first ”, Knowledge map is a special graph structure , Naturally, a special graph database is needed for storage .
Because its data contains entities 、 attribute 、 Relationships, etc , Common relational databases such as MySQL And so on can not well reflect these characteristics of the data , Therefore, the storage of knowledge map data generally adopts graph database (Graph Databases). and Neo4j Is one of the most common graph databases .
4.2 Neo4J download
First, in the Neo4J Official website download Neo4J.
Neo4J Divided into community version and enterprise version :
Enterprise Edition : charge , Expand horizontally 、 Access control 、 Operational performance 、HA And other aspects are better than the community version , Suitable for formal production environment ;
Community Edition : free , Just use the free community version for ordinary learning and development .
4.3 Neo4J install
stay Mac perhaps Linux in , Install well jdk after , Just unzip it and download it Neo4J package , Run the command
bin/neo4j start
1
windows Download the system neo4j and jdk 1.8.0 after , Enter the following command to start neo4j
neo4j.bat console
1
chart 12 Neo4j Running results
4.4 Neo4J Web Interface Introduce
Neo4J Provides a user-friendly Web Interface , Various configurations can be carried out 、 write in 、 Query and so on , It also provides visualization function . similar ElasticSearch equally , I personally like this out of the box design very much .
Open the browser , Input http://127.0.0.1:7474/browser/, Here's the picture 13 Shown , The top of the interface is the input box .
chart 13 Neo4J Web Interface
4.5 Cypher query language
Cypher:
Introduce : yes Neo4J The declarative Graphic Query Language , Allows users to write traversal code without having to write graphical structures , You can query the graphic data efficiently .
Design purpose : similar SQL, It's suitable for developers and doing point-to-point mode on database (ad-hoc) Professional operators of inquiry .
Its capabilities include :
establish 、 to update 、 Delete nodes and relationships
Query and modify nodes and relationships through pattern matching - Manage indexes and constraints, etc
5、 ... and 、Neo4J actual combat
5.1 introduction
The nodes of this case mainly include characters and cities , There are friends between characters 、 Relationship between husband and wife , There is a place of birth relationship between the character and the city . Special thanks, Zhihu @ Strange dust hands teach you a quick introduction to the knowledge map - Neo4J course
Person-Friends-PERSON
Person-Married-PERSON
Person-Born_in-Location
5.2 Create nodes
Delete the previous graph in the database , Ensure a blank environment to operate 【 notes : Use with caution , If there is important information in the library 】:
chart 14 Neo4J Delete Library
MATCH (n) DETACH DELETE n
1
here ,MATCH It's a matching operation , And parentheses () Represents a node node( Parentheses are like a circle ), In parentheses n For identifier .
Create a character node :
CREATE (n:Person {name:‘John’}) RETURN n
1
notes :
CREATE Is the creation operation ,Person Is the label , Represents the type of node .
Curly braces {} Represents the attributes of the node , Properties are similar to Python Dictionary .
The meaning of this statement is to create a label as Person The node of , This node has a name attribute , The property value is John.
Create more character nodes , And named them separately :
CREATE (n:Person {name:‘Sally’}) RETURN n
CREATE (n:Person {name:‘Steve’}) RETURN n
CREATE (n:Person {name:‘Mike’}) RETURN n
CREATE (n:Person {name:‘Liz’}) RETURN n
CREATE (n:Person {name:‘Shawn’}) RETURN n
1
2
3
4
5
Pictured 15 Shown ,6 Person nodes created successfully
chart 15 establish Character node
Create regional node
CREATE (n:Location {city:‘Miami’, state:‘FL’})
CREATE (n:Location {city:‘Boston’, state:‘MA’})
CREATE (n:Location {city:‘Lynn’, state:‘MA’})
CREATE (n:Location {city:‘Portland’, state:‘ME’})
CREATE (n:Location {city:‘San Francisco’, state:‘CA’})
1
2
3
4
5
You can see , The node type is Location, Properties include city and state.
Pictured 16 Shown , share 6 A character node 、5 A regional node ,Neo4J Kindly use different colors to represent different types of nodes .
chart 16 Create regional node
5.3 Create relationships
Friendship
MATCH (a:Person {name:‘Liz’}),
(b:Person {name:‘Mike’})
MERGE (a)-[:FRIENDS]->(b)
1
2
3
notes :
square brackets [] It's relationship ,FRIENDS For the type of relationship .
Notice the arrow here –> There is a direction , From a To b The relationship between . such ,Liz and Mike Between the establishment of FRIENDS Relationship .
Relationship adds attributes
MATCH (a:Person {name:‘Shawn’}),
(b:Person {name:‘Sally’})
MERGE (a)-[:FRIENDS {since:2001}]->(b)
1
2
3
Add more friends :
MATCH (a:Person {name:‘Shawn’}), (b:Person {name:‘John’}) MERGE (a)-[:FRIENDS {since:2012}]->(b)
MATCH (a:Person {name:‘Mike’}), (b:Person {name:‘Shawn’}) MERGE (a)-[:FRIENDS {since:2006}]->(b)
MATCH (a:Person {name:‘Sally’}), (b:Person {name:‘Steve’}) MERGE (a)-[:FRIENDS {since:2006}]->(b)
MATCH (a:Person {name:‘Liz’}), (b:Person {name:‘John’}) MERGE (a)-[:MARRIED {since:1998}]->(b)
1
2
3
4
such , The map has been established :
chart 17 Manual
5.4 establish Place of birth
Establishing relationships between different types of nodes - The relationship between people and places
MATCH (a:Person {name:‘John’}), (b:Location {city:‘Boston’}) MERGE (a)-[:BORN_IN {year:1978}]->(b)
MATCH (a:Person {name:‘Liz’}), (b:Location {city:‘Boston’}) MERGE (a)-[:BORN_IN {year:1981}]->(b)
MATCH (a:Person {name:‘Mike’}), (b:Location {city:‘San Francisco’}) MERGE (a)-[:BORN_IN {year:1960}]->(b)
MATCH (a:Person {name:‘Shawn’}), (b:Location {city:‘Miami’}) MERGE (a)-[:BORN_IN {year:1960}]->(b)
MATCH (a:Person {name:‘Steve’}), (b:Location {city:‘Lynn’}) MERGE (a)-[:BORN_IN {year:1970}]->(b)
1
2
3
4
5
The relationship here is BORN_IN, Indicates the place of birth , There is also an attribute , It means the year of birth .
Pictured 18 , Between the character node and the region node , The relationship between the place of birth of the character has been established .
Build relationships when you create nodes
CREATE (a:Person {name:‘Todd’})-[r:FRIENDS]->(b:Person {name:‘Carlos’})
1
The final map is shown in the figure below :
chart 18 Manual
5.5 Graph database query
Check all the information in Boston Born people
MATCH (a:Person)-[:BORN_IN]->(b:Location {city:‘Boston’}) RETURN a,b
1
The result is shown in Fig. 19:
chart 19 Check all the information in Boston Born people
Query all nodes with external relations
MATCH (a)–() RETURN a
1
The result is shown in Fig. 20:
chart 20 Query all nodes with external relations
Query all related nodes
MATCH (a)-[r]->() RETURN a.name, type
1
The result is shown in Fig. 21:
chart 21 Query all related nodes
Query all nodes with external relations , And the type of relationship
MATCH (a)-[r]->() RETURN a.name, type
1
The result is shown in Fig. 22:
chart 22 Query all nodes with external relations , And the type of relationship
Query all nodes with marriage relationship
MATCH (n)-[:MARRIED]-() RETURN n
1
The result is shown in Fig. 23:
chart 23 Query all nodes with marriage relationship
Find someone's friend's friend
MATCH (a:Person {name:‘Mike’})-[r1:FRIENDS]-()-[r2:FRIENDS]-(friend_of_a_friend) RETURN friend_of_a_friend.name AS fofName
1
return Mike Friends of friends of , The result is shown in Fig. 24:
chart 24 Find someone's friend's friend
5.6 Delete and modify
increase / Modify the properties of the node
MATCH (a:Person {name:‘Liz’}) SET a.age=34
MATCH (a:Person {name:‘Shawn’}) SET a.age=32
MATCH (a:Person {name:‘John’}) SET a.age=44
MATCH (a:Person {name:‘Mike’}) SET a.age=25
1
2
3
4
here ,SET Indicates the modification operation
Delete the attribute of the node
MATCH (a:Person {name:‘Mike’}) SET a.test=‘test’
MATCH (a:Person {name:‘Mike’}) REMOVE a.test
1
2
Deleting attributes is mainly done through REMOVE
3. Delete node
MATCH (a:Location {city:‘Portland’}) DELETE a
1
Deleting a node is DELETE
4. Delete related nodes
MATCH (a:Person {name:‘Todd’})-[rel]-(b:Person) DELETE a,b,rel
1
6、 ... and 、 adopt Python operation Neo4j
6.1 neo4j modular : perform CQL ( cypher ) sentence
step 1: Import Neo4j Drive pack
from neo4j import GraphDatabase
step 2: Connect Neo4j Graph database
driver = GraphDatabase.driver(“bolt://localhost:7687”, auth=(“neo4j”, “xxxxxx”))
add to Relationship function
def add_friend(tx, name, friend_name):
tx.run("MERGE (a:Person {name: $name}) "
“MERGE (a)-[:KNOWS]->(friend:Person {name: $friend_name})”,
name=name, friend_name=friend_name)
Definition Relational functions
def print_friends(tx, name):
for record in tx.run("MATCH (a:Person)-[:KNOWS]->(friend) WHERE a.name = $name "
“RETURN friend.name ORDER BY friend.name”, name=name):
print(record[“friend.name”])
step 3: function
with driver.session() as session:
session.write_transaction(add_friend, “Arthur”, “Guinevere”)
session.write_transaction(add_friend, “Arthur”, “Lancelot”)
session.write_transaction(add_friend, “Arthur”, “Merlin”)
session.read_transaction(print_friends, “Arthur”)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
The core part of the above procedure , Abstract is :
neo4j.GraphDatabase.driver(xxxx).session().write_transaction( function ( contain tx.run(CQL sentence )))
1
perhaps
neo4j.GraphDatabase.driver(xxxx).session().begin_transaction.run(CQL sentence )
1
6.2 py2neo modular : By manipulating the python Variable , Reach the operation neo4j Purpose
step 1: Guide pack
from py2neo import Graph, Node, Relationship
step 2: Build a diagram
g = Graph()
step 3: Create nodes
tx = g.begin()
a = Node(“Person”, name=“Alice”)
tx.create(a)
b = Node(“Person”, name=“Bob”)
step 4: Create an edge
ab = Relationship(a, “KNOWS”, b)
step 5: function
tx.create(ab)
tx.commit()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
py2neo The module conforms to python The habit of , It says it feels smooth , Actually, it can't be CQL Can also write
7、 ... and 、 adopt csv File batch import graph data
The previous lesson is to create a single node , Not suitable for mass import . Here we introduce the use of neo4j-admin import Command import , Suitable for deployment in docker In the environment neo4j.
Other import methods can also refer to Neo4j Import data
csv Divided into two nodes.csv and relations.csv, Note that the starting node in the relationship must be in nodes.csv What can be found in the library :
nodes.csv Need to specify unique ID and nam,
headers = [
‘unique_id:ID’, # Unique identification of node storage in graph database
‘name’, # The name of the node display
‘node_type:LABEL’, # The type of node , such as Person and Location
‘property’ # Other properties of the node
]
1
2
3
4
5
6
7
relations.csv
headers = [
‘unique_id’, # The unique identifier of the relational store in the graph database
‘begin_node_id:START_ID’, # begin_node and end_node The value of comes from nodes.csv Nodes in
‘end_node_id:END_ID’,
‘begin_node_name’,
‘end_node_name’,
‘begin_node_type’,
‘end_node_type’,
‘relation_type:TYPE’, # The type of relationship , such as Friends and Married
‘property’ # Other attributes of the relationship
]
1
2
3
4
5
6
7
8
9
10
11
12
Make two csv after , Import through the following steps neo4j:
Two documents nodes.csv ,relas.csv Put it in
neo4j Install absolute path /import
1
Import to graph database mygraph.db
neo4j bin/neo4j-admin import --nodes=/var/lib/neo4j/import/nodes.csv --relationships=/var/lib/neo4j/import/relas.csv --delimiter=^ --database=xinfang*.db
1
delimiter=^ refer to csv The delimiter
Appoint neo4j Which database to use
modify /root/neo4j/conf/neo4j.conf In the document dbms.default_database=mygraph.db
1
restart neo4j You can see that the data has been imported successfully
Reference material
dried food | The technology and application of learning knowledge map from zero to one
Hand in hand to teach you a quick start knowledge map - Neo4J course
python Operation diagram database neo4j Two ways
Neo4j Import data
schema Introduce
Knowledge map Schema
Meituan brain : The modeling method of knowledge map and its application
Xiao Yanghua . Knowledge map : Concept and technology . Beijing : Electronic industry press , 2020.2-39.
————————————————
Copyright notice : This paper is about CSDN Blogger 「 Yueqian Haobo 」 The original article of , follow CC 4.0 BY-SA Copyright agreement , For reprint, please attach the original source link and this statement .
Link to the original text :https://blog.csdn.net/weixin_44023658/article/details/112503294
边栏推荐
- NTT template for Tourism
- [go record] start go language from scratch -- make an oscilloscope with go language (I) go language foundation
- Leetcode brush questions
- Where is the big data open source project, one-stop fully automated full life cycle operation and maintenance steward Chengying (background)?
- 丸子官网小程序配置教程来了(附详细步骤)
- 新库上线 | 中国记者信息数据
- AI zhetianchuan ml novice decision tree
- 9. Introduction to convolutional neural network
- 接口测试要测试什么?
- 3 years of experience, can't you get 20K for the interview and test post? Such a hole?
猜你喜欢
Analysis of 8 classic C language pointer written test questions
AI遮天传 ML-初识决策树
基于微信小程序开发的我最在行的小游戏
《因果性Causality》教程,哥本哈根大学Jonas Peters讲授
12. RNN is applied to handwritten digit recognition
新库上线 | CnOpenData中国星级酒店数据
Application practice | the efficiency of the data warehouse system has been comprehensively improved! Data warehouse construction based on Apache Doris in Tongcheng digital Department
Kubernetes static pod (static POD)
Redis, do you understand the list
Reptile practice (VIII): reptile expression pack
随机推荐
Cve-2022-28346: Django SQL injection vulnerability
The method of server defense against DDoS, Hangzhou advanced anti DDoS IP section 103.219.39 x
Lecture 1: the entry node of the link in the linked list
"An excellent programmer is worth five ordinary programmers", and the gap lies in these seven key points
12.RNN应用于手写数字识别
10.CNN应用于手写数字识别
Jemter distributed
9. Introduction to convolutional neural network
The whole life cycle of commodity design can be included in the scope of industrial Internet
华泰证券官方网站开户安全吗?
Introduction to paddle - using lenet to realize image classification method II in MNIST
[go record] start go language from scratch -- make an oscilloscope with go language (I) go language foundation
Serial port receives a packet of data
Get started quickly using the local testing tool postman
Course of causality, taught by Jonas Peters, University of Copenhagen
ReentrantLock 公平锁源码 第0篇
Introduction to ML regression analysis of AI zhetianchuan
Prediction of the victory or defeat of the League of heroes -- simple KFC Colonel
《因果性Causality》教程,哥本哈根大学Jonas Peters讲授
【愚公系列】2022年7月 Go教学课程 006-自动推导类型和输入输出